This is an update from yesterday's post on urls which are not currently being parsed by sa in version 2.63
Further cases:
6. msn redirection services g.msn.com
workaround for PerMsgStatus.pm $uri =~ s/^http://g.msn.com/[^*]+?http:(.*)$/http:$1/g;
7. use of html escape sequences in the url http://toform.net/mcp/879/1352/cap112.html To translate these into the equivalent ascii characters, I have used HTML::entities rather than reinvent the wheel
workaround for PerMsgStatus.pm use HTML::Entities; $uri = HTML::Entities::decode($uri);
Here is a cumulative diff containing the workarounds for these and the previous cases. The diff is against PerMsgStatus.pm 2.63 already patched with SpamCopUri 0.09
Hopefully someone can include these in version 3 and more elegantly....
diff PerMsgStatus.pm.orig PerMsgStatus.pm ----cut------- 45a47
use HTML::Entities;
1777a1780,1789
dbg("Got URI: $uri"); $uri =~ s/\%68/h/g; $uri =~ s/\%74/t/g; $uri =~ s/\%70/p/g; $uri =~ s/http:\/([^\/])/http:\/\/$1/g; $uri =~ s/http:\/\/http:\/\//http:\/\//g; $uri =~ s/^http:\/\/(?:drs|rd).yahoo.com\/[^\*]+\*(.*)$/$1/g; $uri =~ s/^http:\/\/g.msn.com\/[^\*]+\?http\:(.*)$/http\:$1/g; $uri = HTML::Entities::decode($uri); dbg("URI after filter: $uri");
----cut-------