[SURBL-Discuss] Fwd: URI's not recognized
John Fawcett
johnml at michaweb.net
Fri May 7 15:47:38 CEST 2004
From: Menno van Bennekom <mvbengro at xs4all.nl>
> At first redirects like this were not recognized:
> http://rd.yahoo.com*http://spammer.spam.biz
> So I removed ^ from the BIZ expression:
> uri BIZ_TLD /(?:https?:\/\/|mailto:)[^\/]+\.biz(?:\/|$)/i
>
> Still the following was not recognized:
> <a href=3Dhttp://away.goingabroadd.biz/aps/cms/>
> Because of the 3D (and other stuff spammers put there lately).
> Only by changing 'uri BIZ_TLD' to 'body BIZ_TLD' it gets recognized.
> But I use SpamCopURI and that also doesn't recognize URI's with things in
> front of http.
> And I can't tell SpamCopURI to use the 'body' check instead or uri..
> How can I make the URI subroutine recognize these URI's?
> Would using SpamAssassin v3.0 help?
Presumably it's not being picked up because http does not occur
on a word boundary. I have a similar example which is picked up
through SpamCopURI because the url is correctly enclosed in
double quotes.
<a href=3D"http://rd.yahoo.com/winery/college/banbury/*http:/len=
derserv.com?partid=3Darlenders">
In order to pick up non quoted urls preceded by quoted printable
characters (like =3D) then a modification is required to the
PerMsgStauts.pm spamassassin module, which doesn't
currently decode quoted printable characters before checking
for url patterns.
If I add a call to MIME::QuotedPrint::decode_qp in get_uri_list
then your example is correctly picked up.
Here's a diff file of the changes I made to PerMsgStatus
(which also deal with HTML encoded characters too and
double http protocols).
-----------------cut----------------------
--- PerMsgStatus.pm.orig 2004-04-25 12:50:05.000000000 +0200
+++ PerMsgStatus.pm 2004-05-07 14:33:55.000000000 +0200
@@ -44,7 +44,8 @@
use Mail::SpamAssassin::Conf;
use Mail::SpamAssassin::Received;
use Mail::SpamAssassin::Util;
-
+use HTML::Entities;
+use MIME::QuotedPrint;
use constant HAS_MIME_BASE64 => eval { require
MIME::Base64; };
use constant MAX_BODY_LINE_LENGTH => 2048;
@@ -1748,6 +1749,8 @@
for (@$textary) {
# NOTE: do not modify $_ in this loop
+ $_ = HTML::Entities::decode($_);
+ $_ = MIME::QuotedPrint::decode_qp($_);
while (/($uriRe)/go) {
my $uri = $1;
@@ -1776,6 +1779,7 @@
$uri = "${base_uri}$uri";
}
}
+ $uri =~ s/http:\/\/http:\/\//http:\/\//gi;
# warn("Got URI: $uri\n");
push @uris, $uri;
-----------------cut----------------------
More information about the Discuss
mailing list