I know that rd.yahoo.com (for example) is a URL redirector that spammers use to get around SURBL.
I also know that if I see a URL like rd.yahoo.com/?http://foo.bar.com, I should actually lookup bar.com.multi.surbl.org.
Not all redirectors are this simple: some allow MIME-encoding or %-encoding of the target URL, others use different characters to separate the target URL from the redirector, etc.
Is there a fairly complete list of URL redirectors I could use, along with code (ideally Perl code) that converts/de-obfuscates "redirector URL" to "target URL"?
On Wednesday, December 20, 2006, 7:13:36 AM, Kelly Jones wrote:
I know that rd.yahoo.com (for example) is a URL redirector that spammers use to get around SURBL.
I also know that if I see a URL like rd.yahoo.com/?http://foo.bar.com, I should actually lookup bar.com.multi.surbl.org.
SpamAssassin checks anything that looks like a domain in. In the example above, it would parse out yahoo.com and bar.com. It would only check bar.com since yahoo.com is on its local whitelist (list of domains to exclude from checking). Other programs may do similar or different things with redirection sites.
Is there a fairly complete list of URL redirectors I could use, along with code (ideally Perl code) that converts/de-obfuscates "redirector URL" to "target URL"?
If you find any, please let us know.
Cheers,
Jeff C. -- Don't harm innocent bystanders.
Thanks, Jeff. My mistake here. My example was too simple.
I was thinking more of things where the "http://" is replaced with "http/0/", or the URL is hex-encoded or percent-encoded, and even redirectors that don't need "http://" in the target at all.
In other words, redirectors where the target isn't obviously a URL. Simple examples:
http://redirector.site/http/0/www.target.site/path [no "http://"] http://redirector.site/www.target.site/path [no http at all] http://redirector.site/target.site/path [hard to tell this is a redirect]
and even worse examples where the target site is "%23%26%29..." or whatever. Some redirectors "de-encode" %xx by accident, and spammers can use that to mask their domain name from SpamAssassin, etc.
It'd be good to have a list of redirectors so any URL could be "canonized" and then people could check the canonized URL against SURBL.
Kelly Jones wrote:
Thanks, Jeff. My mistake here. My example was too simple.
I was thinking more of things where the "http://" is replaced with "http/0/", or the URL is hex-encoded or percent-encoded, and even redirectors that don't need "http://" in the target at all.
In other words, redirectors where the target isn't obviously a URL. Simple examples:
http://redirector.site/http/0/www.target.site/path [no "http://"]
Any real examples of this that you can provide?
http://redirector.site/www.target.site/path [no http at all]
Harder to cover... you're best off writing patterns for domains that provide such redirection on a domain by domain basis. Of course you could do it for all domains, but you'll be looking up lots of bogus stuff.
http://redirector.site/target.site/path [hard to tell this is a redirect]
Same thing.
and even worse examples where the target site is "%23%26%29..." or whatever. Some redirectors "de-encode" %xx by accident, and spammers can use that to mask their domain name from SpamAssassin, etc.
Care to provide an example of such encoding? I'm not aware of any URL encoding that SpamAssassin fails to decode.
It'd be good to have a list of redirectors so any URL could be "canonized" and then people could check the canonized URL against SURBL.
SpamAssassin includes redirector patterns for everything that we're aware of. Info about additional redirectors not covered, is, of course, always welcome.
Daryl
On Sunday, January 14, 2007, 4:24:17 PM, Kelly Jones wrote:
Thanks, Jeff. My mistake here. My example was too simple.
I was thinking more of things where the "http://" is replaced with "http/0/", or the URL is hex-encoded or percent-encoded, and even redirectors that don't need "http://" in the target at all.
In other words, redirectors where the target isn't obviously a URL. Simple examples:
http://redirector.site/http/0/www.target.site/path [no "http://"] http://redirector.site/www.target.site/path [no http at all] http://redirector.site/target.site/path [hard to tell this is a redirect]
Current SpamAssassin catches all of the above.
and even worse examples where the target site is "%23%26%29..." or whatever. Some redirectors "de-encode" %xx by accident, and spammers can use that to mask their domain name from SpamAssassin, etc.
These too, as Daryl mentions.
It'd be good to have a list of redirectors so any URL could be "canonized" and then people could check the canonized URL against SURBL.
Any list can be used by both the good guys and the bad guys.
Jeff C. -- Don't harm innocent bystanders.
I should add that a useful thing for redirection sites to do is to check the destination sites against SURBLs, as many of them are already doing. This tends to prevent abuse of their services:
http://www.surbl.org/redirect.html
Jeff C. -- Don't harm innocent bystanders.