Hi,
I've packaged SpamCopURI, but these files clash with SpamAssassin files.
/usr/lib/perl5/site_perl/5.6.1/Mail/SpamAssassin/Conf.pm
/usr/lib/perl5/site_perl/5.6.1/Mail/SpamAssassin/PerMsgStatus.pm
I assume that these normally replace the originally install SpamAssassin files.
Is this correct, and if so should I look at adding SpamCopURI to my
SpamAssassin package.
Regards,
Rob
--
Robert Brooks, Network Manager, Hyperlink Interactive Ltd
<robb(a)hyperlink-interactive.co.uk> http://hyperlink-interactive.co.uk/
Tel: +44 (0)20 7240 8121 Fax: +44 (0)20 7240 8098
- Help Microsoft stamp out piracy. Give Linux to a friend today! -
On Sunday, April 25, 2004, 11:55:33 AM, Charles Gregory wrote:
> In the case of the '.ca' TLD (www.circa.ca), the original hierarchical
> structure was: domain.city.prov.ca (eg fleanet.hamilton.on.ca).
> The 'ca' registry has since been opened up and many new '.ca' domains
> have the normal 2nd level format. But the registry still permits 3rd and
> 4th level domains under the exsiting provincial and municipality levels.
> I have no idea what safegaurds are in place, if any, to prevent spammers
> from trying to register fourth level domains. I expect all they would have
> to do is have a registered place of business in the municipality.
> I do know that the named province/municipality does *not* have any control
> over the assignments.
Both two and three level domain cases are handled where the two
level domains can be anything.ca that are not listed as a known
geographic TLD like on.ca (for Ontario province) and where the
three level domain would be under a known two level TLD such
as anything.on.ca.
A fourth level geographic domain, for example under a city name,
may not be handled properly, but could be added. If we drive
spammers to be desperate enough to use those then we've already
had great success. :-)
Jeff C.
--
Jeff Chan
mailto:jeffc@surbl.org
http://www.surbl.org/
I have tested SpamCopURI 0.14 and SA 2.63 with my
collection of unparsed urls. This new version deals
with many of the cases, so that the ugly workarounds
I was using can be removed.
By the way, if you're reading this Eric, it might be
worthwhile adding ads.msn.com and g.msn.com
to the list of known redirection services in the
sample spamcop_uri.cf.
Here are the cases that are not picked up:
1. URLs that aren't URLs (missing protocol, even
missing www )
<p>
P<advisory>l<aboveboard>e<compose>a<geochronology>s<moral>e<palfrey> <rada=
r>c<symptomatic>o<yankee>p<conduit>y<souffle> <intake>a<arise>n<eocene>d <=
thickish>paste <impact>this <broadloom>link <road>i<dichotomous>n<quinine>=
t<scoreboard>o y<eager>o<impact>ur b<archenemy>r<band>o<wallop>wser <b> he=
althyexchange.biz</b>
2. Double protocol
http://http://www.eager-18.com/_7953f10b575a18d044cdec5a40bd4f22//?d=vision
Workaround in PerMsgStatus.pm
$uri =~ s/http:\/\/http:\/\//http:\/\//gi;
(NB from the previously published workaround I added case insensitivity)
3. HTML escape sequences in URL
http://toform.net/mcp/879/1352/cap112.html
Workaround in PerMsgStatus.pm
$_ = HTML::Entities::decode($_);
use HTML::Entities;
(NB from the previously published workaround this is different
because it does the conversion earlier on and so takes into
account that http could also be coded with escape sequences.
It seems to work despite the comment
to not modify $_ in get_uri_list.)
Here's a diff of PerMSgStatus.pm with SpamCopURI 0.14
compared to the version with the workarounds mentioned
above.
John
diff -u PerMsgStatus.pm.orig PerMsgStatus.pm
-----------cut-------------
--- PerMsgStatus.pm.orig 2004-04-25 12:50:05.000000000 +0200
+++ PerMsgStatus.pm 2004-04-25 13:01:11.000000000 +0200
@@ -44,6 +44,7 @@
use Mail::SpamAssassin::Conf;
use Mail::SpamAssassin::Received;
use Mail::SpamAssassin::Util;
+use HTML::Entities;
use constant HAS_MIME_BASE64 => eval { require
MIME::Base64; };
@@ -1748,6 +1749,7 @@
for (@$textary) {
# NOTE: do not modify $_ in this loop
+ $_ = HTML::Entities::decode($_);
while (/($uriRe)/go) {
my $uri = $1;
@@ -1776,6 +1778,7 @@
$uri = "${base_uri}$uri";
}
}
+ $uri =~ s/http:\/\/http:\/\//http:\/\//gi;
# warn("Got URI: $uri\n");
push @uris, $uri;
-----------------cut---------------
Just a thought: given the dynamic nature of the sc.surbl.org data,
it might be useful to have a listing history for the SURBL+ checker
(http://www.rulesemporium.com/cgi-bin/uribl.cgi)
Something along the lines of
domain xxxx currently (listed|notl listed) in sc.surbl.org
- added ddmmyyyy hhmm
- removed ddmmyyyy hhmm
- added ddmmyyyy hhmm
- removed ddmmyyyy hhmm
This way it would be possible to check whether old
domains are being left dormant until they work their
way out of the list and then being resurrected
later on.
Another thought was to make a separate historic list of
domains that are removed from sc.surbl.org when there
are no new reports. People who want to do more
aggressive filtering could also use the historic list in
addition to the other lists. The risk of FPs is still low
because I can't forsee there being many "reformed"
spammer domains.
John
> Indeed, I just realized an embarrassing screw-up on my part.
>
> There are two different programs with similar names and
> slightly different properties,including zone file differences:
>
> rbldns
>
> rbldnsd
>
> We are using rbldnsd zone files. I need to update the web site
> to reflect this and post an announcement.
>
As am I.. I guess I didn't realize there was a rbldns either... In rbldnsd, you can wildcard domains...
From
# man rbldnsd
dnset Set of (possible wildcarded) domain names with associated A and
TXT values. Similar to ip4set, but instead of IP addresses,
data consists of domain names (not in reverse form). One domain
name per line, possible starting with wildcard (either with
starâdot (*.) or just a dot). Entry starting with exclamation
sign is exclusion. Default value for all subsequent lines may
be specified by a line starting with a colon.
Wildcards are interpreted as follows:
example.com
only example.com domain is listed, not subdomains
thereof. Not a wildcard entry.
*.example.com
all subdomains of example.com are listed, but not examâ
ple.com itself.
.example.com
all subdomains of example.com and example.com itself are
listed. This is a shortcut: to list a domain name itself
and all itâs subdomains, one may either specify two lines
(example.com and *.example.com), or one line (.examâ
ple.com).
Instead of listing FQDN's, why not just list the TLD with a dot in front... For example, list .mailnotice.biz instead of t.mailnotice.biz in case they change to some other letter in front of their TLD.
Dallas
Hi all!
I've successfully installed rbldnsd and want to set it up along with
rsync'ing of sc/ws/be.surbl.org.
My question is how to set up the RBLDNSD variable in
/etc/sysconfig/rbldnsd and the rsync crontab.
So far I've got a tip on rsync:
Rsyncing is very easy. You would set up a cron job to
call something like:
rsync RSYNC_SERVER_NAME_GOES_HERE::surbl/sc.surbl.org.rbldns .
Seams simple enough. It's just that darn rbldnsd variable that's got me
confused.
--
Mvh,
Roger WJ Alterskjær
Edb-konsulent
Vitenskapsmuseet, NTNU
From: Jeff Chan
To: spamassassin-users
Date: Friday, April 23, 2004, 3:28:49 PM
Subject: Goofy domain names
===8<==============Original message text===============
On Friday, April 23, 2004, 7:38:46 AM, Chris Santerre wrote:
> This is where BigEvil may start going. I can change mine in 2 secs to use
> /\d00\dhosting/ but as soon as I do that, it will be removed from
> be.surbl.org. For obvious reasons they can't use wildcards. All signs point
> to me changing bigevil over to search for this kind of stuff, and simply add
> any static ones I have to ws.surbl.org. But will see.
This is where our philosophies clash slightly.
SURBLs just want a list of known spam domains.
SA rulesets with wildcards try to match entire possible/probable
classes of domain names based on observing prior types of variation.
Both approaches have their merits.
For my purposes, I'd just prefer to get the domains that have
already been found in spam. I acknowledge that that doesn't
have the predictive value of the class approach, but it also
makes FPs less possible in principle. (Though in reality it's
not very likely that any legitimate sites are suddenly going to
start using rxmeds1.com, rxmeds2.com, rxmeds3.com, etc.)
Jeff C.
===8<===========End of original message text===========
Indeed, I just realized an embarrassing screw-up on my part.
There are two different programs with similar names and
slightly different properties,including zone file differences:
rbldns
rbldnsd
We are using rbldnsd zone files. I need to update the web site
to reflect this and post an announcement.
Jeff C.
__
On Friday, April 23, 2004, 4:23:48 PM, Chris Santerre wrote:
> not sure so I've cc'd the big guy on this one.
>>-----Original Message-----
>>From: Dallas L. Engelken [mailto:dallase@nmgi.com]
>>Sent: Friday, April 23, 2004 4:27 PM
>>To:
>>Subject: [RulesEmporium] Rbldns and wildcards
>>
>>
>>What is the recommended 'type' of zone for running bigevil via rbldns..
>>'dnset' ?
>>
>>If dnset is recommended, why arent a lot of the entries in be.surbl.org
>>listed as
>>
>>.domain.com instead of domain.com to cover the TLD and all sub domains?
>>I probably missed this discussing didn't I....
>>
>>
--
Jeff Chan
mailto:jeffc@surbl.org-nospam
http://www.surbl.org/
> [Linked from the surbl site also... -- Jeff C.]
Ok Jeff, you caught me in the middle of a mail asking for this
"lookup link", so where is the web cam you installed? ;-)
--
Mit freundlichen Grüßen / Yours sincerely
Dipl. Inform. Ralph Seichter
HORUS-IT
Ahornweg 10
D-57635 Oberirsen
Tel +49 2686 987880
Fax +49 2686 987889
http://horus-it.de/
On Tuesday, April 20, 2004, 10:51:55 PM, Eric Kolve wrote:
> I have just released SpamCopURI version 0.11. This fixes a few
> bugs that had been reported and adds open redirect resolution.
> This basically takes a URL from say rd.yahoo.com and attempts
> to resolve the Location header without ever fetching from
> the potential spammy site.
> Only the URLs that have hosts that match an address list get
> redirect resolution. As well, redirect resolution is off
> by default, but can be enabled in the conf file. I have
> placed several open redirect sites in the conf file.
> The basic requirement is that the redirect return a 300
> level HTTP response when fetching. I placed google.com
> in there even though they don't have their own redirect
> domain, but this should be fairly safe since most if not
> all google URLs are either redirects or searches. Give
> it a try and tell me what you think. This is all dependent
> upon LWP, but if you don't have LWP everything else
> will function as it did before.
Eric, you may want to share your redirection resolution
strategies with the 3.0 developers. I haven't heard Justin
getting beyond patterns yet. ;-)
Jeff C.