Jeff Chan wrote:
On Thursday, September 14, 2006, 7:02:46 AM, Ron Guerin wrote:
Jeff Chan wrote:
Submission-time checks were the main intention.
That's what I thought, but I wanted some clarification. I'm the author of a redirector created in October 2004 that has always used SURBL to check submitted URLs. Nevertheless, I find my database polluted with abuse. Even though most of it redirects to pages in Asian languages I don't understand, it's not hard to recognize spammers landing pages. It also occurs to me that checking the URL at submission time is probably checking it before any spam with that URL has been sent, and by extension, before it would appear in SURBL.
Makes sense.
Among the things I'm considering, is re-checking accepted URLs a few hours later, and flagging them for abuse if they come up with a hit. The other thing I think needs to be done is to follow and count the number of re-directions. I see a lot of URLs in my database that are other redirection services.
A re-check sounds reasonable. If you'd be doing a large volume of queries you may want to consider using rsynced local versions of the zone files:
Won't be necessary. My redirector is primarily for people to install on their own sites. I run a public copy for the sake of both providing an demonstration, and to get some real-world usage data. I've got hundreds, not hundreds of thousands of URLs from 2 years of operation, at the rate of a few new URLs a day. I had considered slowly re-checking the entire database just to see what turned up, but I decided it wouldn't tell me anything useful to check today's SURBL against submissions made two years ago. I do think that re-checking submissions a few hours after acceptance might be useful. If it is, I'll report that back here.
I can offer the following observation, which is that for a service that's not promoted outside of merely existing on SourceForge, my public redirection service has been primarily discovered by those seeking to conceal their true destination, rather than by those seeking to shorten a URL. Given that my situation is not normal and I've got a relatively small dataset to work with, I'm hesitant to jump to conclusions, but I'd be interested in hearing from other redirector operators about what they're finding out about themselves. From where I'm sitting, it's looking like the bad outweighs the good, substantially.
- Ron