[SURBL-Discuss] Re: [long] summary of currently unparsed url types

Loren Wilton lwilton at earthlink.net
Sat Apr 17 05:06:31 CEST 2004

Thanks for the rules fodder!

BTW, msn also has an open redirector that is seeing much use:

uri   LWTEST_REDIRECT1 m'http://g.msn.com/0AD0000[A-Z]/\d{6}\.1[/\?]'i
describe LWTEST_REDIRECT1 Open MSN redirector found in URL


----- Original Message ----- 
From: "John Fawcett" <johnml at michaweb.net>
To: <spamassassin-dev at incubator.apache.org>; <discuss at lists.surbl.org>
Sent: Saturday, April 17, 2004 3:22 AM
Subject: [long] summary of currently unparsed url types

> I'd just like to summarize the current position with regard to url types
> which are not currently parsed correctly by sa and ask for some help with
> tests using version 3.
> Yahoo offers a public redirection service. You can enter a url like this:
> http://rds.yahoo.com/*http://www.google.com
> and you get sent to www.google.com. (By the way I'm not sure what the
> of this is, because unlike
> tinyurl.com the yahoo url is longer. However it sure comes in handy to
> spammers who are trying
> to get past sa URI rulesets.)
> Spam which is not picked up correctly by sa uri filters often contains
> redirection urls, even though the redirected domain is in sc.surbl.org.
> Chan has opened a bug against URIDNSBL.pm to ask for support for parsing
> the spammer domain from redirected urls.
> http://bugzilla.spamassassin.org/show_bug.cgi?id=3261
> Things are getting more complicated, because spam coming through seems to
> contain features which
> avoid it being picked up even by an altered parser which strips off the
> http://rds.yahoo.com/* part.
> I wanted to make a summary of current understanding of the url types which
> break parsing. I've tested these with SpamCopURI and ver 2.63. If someone
> offers to test (from case 2 onwards)
> with URIDNSBL and version 3, I'll post suitable test cases.
> 1.http://rds.yahoo.com/*http://spammer.domain.tld/aaaaaaaaaa (bug 3261)
> Workaround in PerMsgStatus.pm:
>      $uri =~ s/^http:\/\/(?:drs|rd).yahoo.com\/[^\*]+\*(.*)$/$1/g;
> 2.http://rds.yahoo.com/*%68ttp://spammer.domain.tld/aaaaaaaa (follow-up to
> bug 3261
> including test case)
> (the other possible variations on this which I haven't seen as yet can use
> %NN instead of
> any or all the 'http' characters in the redirected domain. e.g.
> http://rds.yahoo.com/*%68%74%74%70://spammer.domain.tld/aaaaaaaa
> Workaround in PerMsgStatus.pm:
>          $uri =~ s/\%68/h/g;
>          $uri =~ s/\%74/t/g;
>          $uri =~ s/\%70/p/g;
> 3. http://rd.yahoo.com/winery/college/banbury/*http:/len=
> derserv.com?partid=3Darlenders
> The redirect url is formally incorrect (there is a single slash
> after http) but browsers have no problem with this. The parser
> cannot handle it.
> Workaround in PerMsgStatus.pm:
>     $uri =~ s/http:\/([^\/])/http:\/\/$1/g;
> By the way, this url contains 'quotable printable' characters ('= newline'
> and '=3d')
> which are not causing problems to the parser. Neither is the absence
> of a trailing slash before the ? causing problems in parsing.
> 4. URLS without http: in front of them. The following seen in a browser
> reads:
> "Please copy and paste this link into your browser healthyexchange.biz "
> <p>
> P<advisory>l<aboveboard>e<compose>a<geochronology>s<moral>e<palfrey>
> r>c<symptomatic>o<yankee>p<conduit>y<souffle> <intake>a<arise>n<eocene>d
> thickish>paste <impact>this <broadloom>link
> t<scoreboard>o y<eager>o<impact>ur b<archenemy>r<band>o<wallop>wser <b>
> althyexchange.biz</b>
> Probably not much that can be dones with this.
> 5.
> Here the double http prevents this being parsed. (OK it wasn't in
> sc.surbl.org but even
> if it was it wouldn't have been picked up)
> Workaround in PerMsgStatus.pm:
>     $uri =~ s/http:\/\/http:\/\//http:\/\//g;
> John

More information about the Discuss mailing list