[SURBL-Discuss] update on unparsed url types

John Fawcett johnml at michaweb.net
Sun Apr 18 15:29:56 CEST 2004

This is an update from yesterday's post on urls which are not
currently being parsed by sa in version 2.63

Further cases:

6. msn redirection services g.msn.com

workaround for PerMsgStatus.pm
    $uri =~ s/^http:\/\/g.msn.com\/[^\*]+\?http\:(.*)$/http\:$1/g;

7. use of html escape sequences in the url
To translate these into the equivalent ascii characters,
I have used HTML::entities rather than reinvent the wheel

workaround for PerMsgStatus.pm
    use HTML::Entities;
    $uri = HTML::Entities::decode($uri);

Here is a cumulative diff containing the workarounds for these
and the previous cases. The diff is against PerMsgStatus.pm
2.63 already patched with SpamCopUri 0.09

Hopefully someone can include these
in version 3 and more elegantly....

diff PerMsgStatus.pm.orig PerMsgStatus.pm
> use HTML::Entities;
>       dbg("Got URI: $uri");
>        $uri =~ s/\%68/h/g;
>        $uri =~ s/\%74/t/g;
>        $uri =~ s/\%70/p/g;
>        $uri =~ s/http:\/([^\/])/http:\/\/$1/g;
>        $uri =~ s/http:\/\/http:\/\//http:\/\//g;
>        $uri =~ s/^http:\/\/(?:drs|rd).yahoo.com\/[^\*]+\*(.*)$/$1/g;
>        $uri =~ s/^http:\/\/g.msn.com\/[^\*]+\?http\:(.*)$/http\:$1/g;
>        $uri = HTML::Entities::decode($uri);
>       dbg("URI after filter: $uri");

