From mvbengro@xs4all.nl Fri May 7 22:52:30 2004 From: Menno van Bennekom To: discuss@lists.surbl.org Subject: Re: [SURBL-Discuss] Fwd: URI's not recognized Date: Fri, 07 May 2004 22:52:24 +0200 Message-ID: <5844.80.126.122.158.1083963144.squirrel@webmail.xs4all.nl> In-Reply-To: <021e01c43431$7c3dc000$2001a8c0@michaweb.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============4724200133872977561==" --===============4724200133872977561== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Thanks John! This looks like a straightforward change. I'll try it on my testmailserver after the weekend (or sooner if I can't control myself). Regards Menno > From: Menno van Bennekom >> At first redirects like this were not recognized: >> http://rd.yahoo.com*http://spammer.spam.biz >> So I removed ^ from the BIZ expression: >> uri BIZ_TLD /(?:https?:\/\/|mailto:)[^\/]+\.biz(?:\/|$)/i >> >> Still the following was not recognized: >> >> Because of the 3D (and other stuff spammers put there lately). >> Only by changing 'uri BIZ_TLD' to 'body BIZ_TLD' it gets recognized. >> But I use SpamCopURI and that also doesn't recognize URI's with things >> in >> front of http. >> And I can't tell SpamCopURI to use the 'body' check instead or uri.. >> How can I make the URI subroutine recognize these URI's? >> Would using SpamAssassin v3.0 help? > > Presumably it's not being picked up because http does not occur > on a word boundary. I have a similar example which is picked up > through SpamCopURI because the url is correctly enclosed in > double quotes. > > derserv.com?partid=3Darlenders"> > > In order to pick up non quoted urls preceded by quoted printable > characters (like =3D) then a modification is required to the > PerMsgStauts.pm spamassassin module, which doesn't > currently decode quoted printable characters before checking > for url patterns. > > If I add a call to MIME::QuotedPrint::decode_qp in get_uri_list > then your example is correctly picked up. > > Here's a diff file of the changes I made to PerMsgStatus > (which also deal with HTML encoded characters too and > double http protocols). > > -----------------cut---------------------- > --- PerMsgStatus.pm.orig 2004-04-25 12:50:05.000000000 +0200 > +++ PerMsgStatus.pm 2004-05-07 14:33:55.000000000 +0200 > @@ -44,7 +44,8 @@ > use Mail::SpamAssassin::Conf; > use Mail::SpamAssassin::Received; > use Mail::SpamAssassin::Util; > - > +use HTML::Entities; > +use MIME::QuotedPrint; > use constant HAS --===============4724200133872977561==--