I have just released SpamCopURI version 0.11. This fixes a few
bugs that had been reported and adds open redirect resolution.
This basically takes a URL from say rd.yahoo.com and attempts
to resolve the Location header without ever fetching from
the potential spammy site.
Only the URLs that have hosts that match an address list get
redirect resolution. As well, redirect resolution is off
by default, but can be enabled in the conf file. I have
placed several open redirect sites in the conf file.
The basic requirement is that the redirect return a 300
level HTTP response when fetching. I placed google.com
in there even though they don't have their own redirect
domain, but this should be fairly safe since most if not
all google URLs are either redirects or searches. Give
it a try and tell me what you think. This is all dependent
upon LWP, but if you don't have LWP everything else
will function as it did before.
I have removed all the deprecated tests that depended on local Storable
data. See the INSTALL file for information about upgrading
from a previous version. There is all a bit more information
about installation that should help those that had trouble
in the past.
--eric
Here are some good comments from Dave Funk about the
handling/creation of the SURBLs. Please comment on his
suggestions, several of which we may want to implement as time
permits.
Jeff C.
__
On Tue, 20 Apr 2004, Jeff Chan wrote:
> On Tuesday, April 20, 2004, 1:20:05 PM, Charles Gregory wrote:
> > Would it be possible to have 'surbl.org' run a *combined* blacklist, so
> > that people who want to check both 'ws.surbl.org' *and* 'sc.surbl.org' can
> > do it with ONE dns lookup request, instead of two?
>
> Good question, which Matt asks also. Here's my response :-)
>
[snip..]
> > Because ws is larger and more stable, the zone files for it
> > gets a six hour TTL compared to 10 minutes for sc. Due to the
> > differences between the time scales, sizes, and data sources of
> > ws and sc, we probably won't be offering a combined ws plus sc
> > list. For example it would be difficult to say what TTL a
> > merged list should get, and you probably would not want a
> > megabyte plus BIND zone file refreshing every 10 minutes. For
> > those using rsynced zone files that would probably not be an
> > issue, but for those using BIND, the DNS traffic quite well
> > could be.
>
> So the quick answer is they'll probably not be combined.
>
> However we probably will offer a combined version of Bill's
> list and Chris' BigEvil list since they are more similar in
> character.
A few comments.
1) It is possible to set a TTL in a DNS zone on a per-record basis.
(at least with BIND). So you could combine the two zones and
have the 'sc' records flagged with a short TTL, and 'ws' with
longer.
2) Newer versions of BIND support incremental zone-transfer, and
so will just push changes.
3) We also secondary MAPS RBL+ zone, that's a 54Mbyte zone that updates
every 3 hours. (IE 18Mbyte/hour). A 1Mbyte x 10 minutes would be
only 6Mbytes/hour, chicken feed. ;)
4) Over half the size of those zones is in the TXT records. Just
changing 'Message body contains domain in sa-blacklist. See:
http://www.stearns.org/sa-blacklist/' to 'Blocked, See:
http://www.stearns.org/sa-blacklist/' reduced the 'ws' zone size by 33%
5) It's possible to combine the zones but keep the data logically seperate
so people can differentiate and adjust scores/policys accordingly.
Check out how MAPS does RBL+, the A record returns an "IP address"
that is effectivly a bit-mask flag to indicate which MAPS zone
the original hit was from (DUL, RSS, RBL, OPS, etc).
Look at how the 'check_rbl' and 'check_rbl_sub' routines are
used inside SA to pull apart a single DNS query against RBL+
(at least in SA 2.6*, havn't looked at 3.0 yet ;)
This is not to imply criticism if your response, just some tech info
to show alternatives.
Regardless, I would recommend using 5) when you combine Bill's list
and Chris' BigEvil so that people can differentiate in case they have
score/policy concerns regarding the two. People who just look for
the existence of the A record won't notice the difference but people
who know and care can utilize the additional info.
Dave
--
Dave Funk University of Iowa
<dbfunk (at) engineering.uiowa.edu> College of Engineering
319/335-5751 FAX: 319/384-0549 1256 Seamans Center
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{
--
Jeff Chan
mailto:jeffc@surbl.org-nospam
http://www.surbl.org/
> -----Original Message-----
> From: Simon Byrnand [mailto:simon@igrin.co.nz]
> Sent: Monday, April 19, 2004 8:10 PM
> To: Jeff Chan; SURBL Discuss
> Subject: Re: RFC: Removing example.com as a testpoint (Was: Re:
> [SURBL-Discuss] Re: No install Problems with 0.10)
*snip*
>Sounds fine to me....(since no one else is commenting :)
>The whole idea of surbl relies on the fact that there is no legitimate way
>that a spam url would be found in a genuine message. This isn't too hard
>considering how obscure and obfuscated most spammer domain names
>are...they're not the kind of thing that you'd write by accident in casual
>conversation :)
Oh I wouldn't say that is 100% true!
The following are the rules I removed from BigEvil. They are "Shammers",
people who send spam AND ham. So these were the domains I had to remove
even though they had sent lots of spam. The one company they get as a
customer to send ham forces your hand to remove them. SURBL will need to
create a list like this sometime soon.
uri BigEvilList_W /\b(?:exacttarget\.com|pandasoftware\.com)\b/i
uri BigEvilList_X
/\b(?:12\.156\.5\.69|platinumromance\.com|quickresponder4u\.biz|globalcomput
er\.com|emailfactory\.com|a801\.g\.akamai\.net|rsc03\.net|a904\.g\.akamai\.n
et|doubleclick\.speedera\.net|us\.i1\.yimg\.com|us\.js1\.yimg\.com|us\.news1
\.yimg\.com|thestar\.com|tm0\.com|smartbargains\.com|terra\.es)\b/i
uri BigEvilList_Y
/\b(?:destinationsite\.com|ccprod\.roving\.com|constantcontact\.com|akmaitec
h\.net|dobhran\.com|service\.bfast\.com|blockspamnow\.com|xmr3\.com|entertai
nment\.com|us\.a1\.yimg\.com|click\.atdmt\.com|adfarm\.mediaplex\.com|affili
atefuel\.com|ashnin\.com|bmgmusic\.com|clickserve\.cc-dt\.com)\b/i
uri BigEvilList_Z
/\b(?:mayco\.com|ediets\.com|flowgo\.com|politechbot\.com|servedby\.advertis
ing\.com|g\.trackbot\.com|promos\.hotbar\.com|home\.efax\.com|ipunda\.com|ld
\.net|m0\.net|mybonuscenter\.com|OpenRate\.ymc0\.net|petcarerx\.com|pmd\.bz|
qksrv\.net|registeredsite\.com|statik\.topica\.com)\b/i
Isn't this fun? :)
--Chris
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
forwarded from one DNSBLer...
> I'm running one of the proxies for openrbl.org. It's dead easy to set
> this up -- a copy of Pound, a dedicated IP address, and 5 minutes to
> write a 20 line config file. Pound helps "clean" the requests, and
> hides the real back-end server.
>
> The portion of openrbl.org I proxy uses under 10kbps on average, with a
> spike every few days for up to a few hours when someone tries to smack
> it. I run the IP through a 64kbps pipe with ipfw (gateway box runs
> FreeBSD) for extra warmfuzzies, and packet filter all but port-80 to the
> IP I've assigned.
>
> > [...] fancy posting to discuss(a)lists.surbl.org with tips?
>
> I'm at my quota for mailing lists -- if I subscribe to another, my nose
> will bleed. Pound is dead easy. I would venture to guess that someone
> who can't get it running probably shouldn't.
>
> Pound is at http://www.apsis.ch/pound/, or in ports/www/pound if you're
> FreeBSDing it.
Another tip from the SBL folks:
> I'm not even sure where the root SBL zone server is. All the public zone
> servers and AXFR feeds are seperate. Query load is rather large, so
> sub-zones are being broken out to two levels, allowing for more
> nameservers to spread out the load. (Admins are encouraged to use
> close-by servers when possible.) Check "NS" records for
> "sbl.spamhaus.org".
>
> Probably goes without saying, but selecting a zone name that can be "end
> of lifed" when needed should be considered.
Also, someone else mentioned that the top-level zone, "surbl.org" for
example, may become the target. So that also needs 2ndaries.
- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Exmh CVS
iD8DBQFAhBoMQTcbUG5Y7woRArYEAKDNaPrBvk8R9TgGbxVRrVZKNXftKQCgzd3U
zZlpJ3DwvnDV1aUlly2jspU=
=mVsZ
-----END PGP SIGNATURE-----
Did you remember to copy the .cf file to your spamassassin rules folder? (I only mention this b/c I tend to forget to do the most obvious of steps).
If you run spamassassin -D -t on a test message, do you see it attempting to perform the queries?
-Charles
>>> Paul Barbeau <Paul(a)hypernet.ca> 04/13/2004 9:00:36 PM >>>
I guess i spoke to soon.. The "make test" works and it looks like the "make
install" works however when i send an email through with
"http://test.diliberatelybroken.surbl.org/" in it nothing gets tagged. Is that the correct URL
to use to confirm it is working?
Any ideas?
Paul
_______________________________________________
Discuss mailing list
Discuss(a)lists.surbl.org
http://lists.surbl.org/mailman/listinfo/discuss
> > > I would suggest using a known TLD. I could imagine SpamAssassin
> > > or another product, if they don't already, including optimisations
> > > to avoid querying on things that *look* like URLs but
> can't possibly
> > > be; and putting in a special case for SURBL would be a bit silly
> > > if we can avoid it.
> >
> > That was my inclination too. A real TLD with a fake domain
> > should be the most "standard"... for an obscured domain. :-)
>
> Then, my suggestion would be something inside the surbl
> domain. Otherwise
> there is a risk (albeit remote) that someone could register
> the domain and
> have a problem.
>
sheesh...I don't see why this is still being debated. It seems pretty obvious that example.com is a bad choice, and that a TLD within the surbl domain would be better a better test point. Now just pick one and announce it.
Next please...
This is an update from yesterday's post on urls which are not
currently being parsed by sa in version 2.63
Further cases:
6. msn redirection services g.msn.com
workaround for PerMsgStatus.pm
$uri =~ s/^http:\/\/g.msn.com\/[^\*]+\?http\:(.*)$/http\:$1/g;
7. use of html escape sequences in the url
http://toform.net/mcp/879/1352/cap112.html
To translate these into the equivalent ascii characters,
I have used HTML::entities rather than reinvent the wheel
workaround for PerMsgStatus.pm
use HTML::Entities;
$uri = HTML::Entities::decode($uri);
Here is a cumulative diff containing the workarounds for these
and the previous cases. The diff is against PerMsgStatus.pm
2.63 already patched with SpamCopUri 0.09
Hopefully someone can include these
in version 3 and more elegantly....
diff PerMsgStatus.pm.orig PerMsgStatus.pm
----cut-------
45a47
> use HTML::Entities;
1777a1780,1789
> dbg("Got URI: $uri");
> $uri =~ s/\%68/h/g;
> $uri =~ s/\%74/t/g;
> $uri =~ s/\%70/p/g;
> $uri =~ s/http:\/([^\/])/http:\/\/$1/g;
> $uri =~ s/http:\/\/http:\/\//http:\/\//g;
> $uri =~ s/^http:\/\/(?:drs|rd).yahoo.com\/[^\*]+\*(.*)$/$1/g;
> $uri =~ s/^http:\/\/g.msn.com\/[^\*]+\?http\:(.*)$/http\:$1/g;
> $uri = HTML::Entities::decode($uri);
> dbg("URI after filter: $uri");
----cut-------
On Monday, April 19, 2004, 5:32:32 PM, Devin Carraway wrote:
> Two suggestions:
> On Mon, Apr 19, 2004 at 04:52:03PM -0700, Jeff Chan wrote:
>> 4. Look up the domain name in the SURBL by prepending it to
>> the name of the SURBL, e.g., domainundertest.com.sc.surbl.org,
>> then doing Address record DNS resolution on the resulting
>> combined name. A non-result indicates lack of inclusion in the
>> list. A result of 127.0.0.2 represents inclusion, i.e., probable
>> spam.
> I suggest you clarify the record type(s) that should be looked up, and
> what the necessary/sufficient conditions are for considering a response
> a positive match. For example, some DNSBL client implementations only
> look for a TXT record, to obtain a useful explanation they can report in
> an error. Is that sufficient for a positive? Is an A record containing
> 127.0.0.2 necessary?
>> 5. Handle numeric IPs in URIs similarly, but reverse the octet
>> ordering before comparison against the RBL. This is standard
>> practice for RBLs. For example, http://1.2.3.4/ is checked as
>> 4.3.2.1.sc.surbl.org.
> Here you might want to stipulate that IPv4 octets should be looked up in
> their base-10 representation.
Thanks for the excellent suggestions Darren. I've updated as
follows:
> 4. Look up the domain name in the SURBL by prepending it to the
> name of the SURBL, e.g., domainundertest.com.sc.surbl.org, then
> doing Address (A) record DNS resolution on the resulting
> combined name. A non-result indicates lack of inclusion in the
> list. A result of 127.0.0.2 represents inclusion, i.e.,
> probable spam. SURBL matches also have a TXT record associated
> with them containing a descriptive reason for list inclusion,
> but the A record is the preferred query.
>
> 5. Handle numeric IPs in URIs similarly, but reverse the octet
> ordering before comparison against the RBL. This is standard
> practice for RBLs. For example, http://1.2.3.4/ is checked as
> 4.3.2.1.sc.surbl.org. Numeric addresses should be in base-10
> representation.
How does that look?
Jeff C.
On Sunday 18 April 2004 04:55 pm, Theo Van Dinter wrote:
> come to think of this: have spammers used things like tinyurl yet?
No instances of TinyUrl in any of the some 8,000 spam I'v received in April,
but looking in the news.admin.net-abuse.sightings newsgroup
(http://groups.google.com/groups?hl=en&lr=lang_en&ie=UTF-8&oe=UTF-8&safe=off…)
shows over 300 examples for spam with TinyUrl links.
--
Give a man a match, and he'll be warm for a minute, but set him on
fire, and he'll be warm for the rest of his life.
Advanced SPAM filtering software: http://spamassassin.org
I'd just like to summarize the current position with regard to url types
which are not currently parsed correctly by sa and ask for some help with
tests using version 3.
Yahoo offers a public redirection service. You can enter a url like this:
http://rds.yahoo.com/*http://www.google.com
and you get sent to www.google.com. (By the way I'm not sure what the point
of this is, because unlike
tinyurl.com the yahoo url is longer. However it sure comes in handy to
spammers who are trying
to get past sa URI rulesets.)
Spam which is not picked up correctly by sa uri filters often contains
redirection urls, even though the redirected domain is in sc.surbl.org. Jeff
Chan has opened a bug against URIDNSBL.pm to ask for support for parsing out
the spammer domain from redirected urls.
http://bugzilla.spamassassin.org/show_bug.cgi?id=3261
Things are getting more complicated, because spam coming through seems to
contain features which
avoid it being picked up even by an altered parser which strips off the
http://rds.yahoo.com/* part.
I wanted to make a summary of current understanding of the url types which
break parsing. I've tested these with SpamCopURI and ver 2.63. If someone
offers to test (from case 2 onwards)
with URIDNSBL and version 3, I'll post suitable test cases.
1.http://rds.yahoo.com/*http://spammer.domain.tld/aaaaaaaaaa (bug 3261)
Workaround in PerMsgStatus.pm:
$uri =~ s/^http:\/\/(?:drs|rd).yahoo.com\/[^\*]+\*(.*)$/$1/g;
2.http://rds.yahoo.com/*%68ttp://spammer.domain.tld/aaaaaaaa (follow-up to
bug 3261
including test case)
(the other possible variations on this which I haven't seen as yet can use
%NN instead of
any or all the 'http' characters in the redirected domain. e.g.
http://rds.yahoo.com/*%68%74%74%70://spammer.domain.tld/aaaaaaaa
Workaround in PerMsgStatus.pm:
$uri =~ s/\%68/h/g;
$uri =~ s/\%74/t/g;
$uri =~ s/\%70/p/g;
3. http://rd.yahoo.com/winery/college/banbury/*http:/len=
derserv.com?partid=3Darlenders
The redirect url is formally incorrect (there is a single slash
after http) but browsers have no problem with this. The parser
cannot handle it.
Workaround in PerMsgStatus.pm:
$uri =~ s/http:\/([^\/])/http:\/\/$1/g;
By the way, this url contains 'quotable printable' characters ('= newline'
and '=3d')
which are not causing problems to the parser. Neither is the absence
of a trailing slash before the ? causing problems in parsing.
4. URLS without http: in front of them. The following seen in a browser
reads:
"Please copy and paste this link into your browser healthyexchange.biz "
<p>
P<advisory>l<aboveboard>e<compose>a<geochronology>s<moral>e<palfrey> <rada=
r>c<symptomatic>o<yankee>p<conduit>y<souffle> <intake>a<arise>n<eocene>d <=
thickish>paste <impact>this <broadloom>link <road>i<dichotomous>n<quinine>=
t<scoreboard>o y<eager>o<impact>ur b<archenemy>r<band>o<wallop>wser <b> he=
althyexchange.biz</b>
Probably not much that can be dones with this.
5.
http://http://www.eager-18.com/_7953f10b575a18d044cdec5a40bd4f22//?d=vision
Here the double http prevents this being parsed. (OK it wasn't in
sc.surbl.org but even
if it was it wouldn't have been picked up)
Workaround in PerMsgStatus.pm:
$uri =~ s/http:\/\/http:\/\//http:\/\//g;
John