From: Menno van Bennekom mvbengro@xs4all.nl
At first redirects like this were not recognized: http://rd.yahoo.com*http://spammer.spam.biz So I removed ^ from the BIZ expression: uri BIZ_TLD /(?:https?://|mailto:)[^/]+.biz(?:/|$)/i
Still the following was not recognized:
<a href=3Dhttp://away.goingabroadd.biz/aps/cms/> Because of the 3D (and other stuff spammers put there lately). Only by changing 'uri BIZ_TLD' to 'body BIZ_TLD' it gets recognized. But I use SpamCopURI and that also doesn't recognize URI's with things in front of http. And I can't tell SpamCopURI to use the 'body' check instead or uri.. How can I make the URI subroutine recognize these URI's? Would using SpamAssassin v3.0 help?
Presumably it's not being picked up because http does not occur on a word boundary. I have a similar example which is picked up through SpamCopURI because the url is correctly enclosed in double quotes.
<a href=3D"http://rd.yahoo.com/winery/college/banbury/*http:/len= derserv.com?partid=3Darlenders">
In order to pick up non quoted urls preceded by quoted printable characters (like =3D) then a modification is required to the PerMsgStauts.pm spamassassin module, which doesn't currently decode quoted printable characters before checking for url patterns.
If I add a call to MIME::QuotedPrint::decode_qp in get_uri_list then your example is correctly picked up.
Here's a diff file of the changes I made to PerMsgStatus (which also deal with HTML encoded characters too and double http protocols).
-----------------cut---------------------- --- PerMsgStatus.pm.orig 2004-04-25 12:50:05.000000000 +0200 +++ PerMsgStatus.pm 2004-05-07 14:33:55.000000000 +0200 @@ -44,7 +44,8 @@ use Mail::SpamAssassin::Conf; use Mail::SpamAssassin::Received; use Mail::SpamAssassin::Util; - +use HTML::Entities; +use MIME::QuotedPrint; use constant HAS_MIME_BASE64 => eval { require MIME::Base64; };
use constant MAX_BODY_LINE_LENGTH => 2048; @@ -1748,6 +1749,8 @@
for (@$textary) { # NOTE: do not modify $_ in this loop + $_ = HTML::Entities::decode($_); + $_ = MIME::QuotedPrint::decode_qp($_); while (/($uriRe)/go) { my $uri = $1;
@@ -1776,6 +1779,7 @@ $uri = "${base_uri}$uri"; } } + $uri =~ s/http://http:///http:///gi;
# warn("Got URI: $uri\n"); push @uris, $uri; -----------------cut----------------------