At Daniel Quinlan's suggestion, we've started to check a sampling of SURBL name server queries against sbl and xbl.spamhaus.org. His interest is as a potential replacement for the very time consuming NS record lookups done with uridnsbl.
We haven't turned these into a SURBL yet, but probably will eventually. So far this has resulted in about 11k SBL domains with about 60% overlap with existing SURBLs. The fun thing is that this catches at a very early stage spams from scumbags like "Media Dreamland" that has been spamming free computer monitors, etc. lately. Some of these type of operations that reuse the same name server IPs, but register and change domains frequently are caught this way, just like uridnsbl does, but with perhaps a few missed due to sampling effects on the DNS queries. This method also features a much lower global DNS overhead since the lookups are done once in a centralized way, and not repeatedly in a gazillion SpamAssassin installations on the same domains in a very distributed and redundant way.
The way this works is that we sample DNS queries from SURBL lookups and compare new wild domains (i.e. domains found in general email URIs), against xbl and sbl and build up lists of the matches. (To be more correct, it's the wild domain name server "NS" record resolved ip addresses which are checked against sbl and xbl.) Along with this will need to be expiration runs, which I haven't built yet. (In other words, domains should come off the lists when they no longer resolve or no longer resolve to name servers in sbl or xbl.)
The main downside is that domains matching name servers listed in sbl or xbl definitely has more false positives than our other SURBL lists. We'll want to do some testing, but it may be as high as 1%, so they'd need to be used carefully.
Some perhaps other interesting stats after about two weeks:
unique queries logged so far about 250k (These are reduced to base domains where easy) SBL matches so far about 11k XBL matches so far about 400
SBL are checked for NS records only XBL are checked for NS, www, base domain against XBL (but not MX)
Questions? Comments? Suggestions?
Jeff C. -- "If it appears in hams, then don't list it."
we've started to check a sampling of SURBL name server queries against sbl and xbl.spamhaus.org.
Jeff,
For months now, I've been converting domains within messages to IP address and checking these (along with raw IP addresses) against "sbl-xbl.spamhaus.org". This was a final stage of filtering where almost all spam had already been caught. This way, I could audit these and not have a mountain of spam messages to audit.
From all of the "hands on" analysis that I've done, I have some suggestions.
1st, if you are converting domains to IPs and then checking these IPs against spamhaus, you may have to make sure your system can whitelist the domains **before** conversion to IP since the IPs can change without notice.
2nd, SpamHaus keeps listing the following: msn.click-url.com, (& variations) (These show up FREQUENTLY in hams, so I'd Whitelist these up front. They seem to go in an out of SpamHaus intermittently.) FOR EXAMPLE: msn.click-url.com = 216.39.69.75 http://www.spamhaus.org/query/bl?ip=216.39.69.75 ...points to... http://www.spamhaus.org/sbl/sbl.lasso?query=SBL20705
3rd, in fact, SpamHaus is going to list a lot of greymarketers that shouldn't be listed in SURBL (flowgo, euniverse, etc)
4th, most of the FPs I find in SpamHaus are XBL listings where the data source for that particular FP was http://cbl.abuseat.org/
CBL catches a LOT of spam... but it also periodically will list the mailserver for respected IPS where that ISP had one user who send out a bunch of spam and then CBL listed the IP address of that server. Unfortunately, this creates a lot of collateral damage. Recently, I experienced this with one of my clients's customer's BellSouth E-mail services. (I don't know the ratio of XBL stuff via CBL versus XBL stuff from other sources. I'd be curious to know this.)
Jeff, very likely, (I have a feeling) I've misunderstood your original intended use of SpamHaus? But maybe this information will be helpful anyway? I would definitely recommend NOT using the strategy I've described as an **automatic** way to get listed in SURBL. This would defeat MOST of the hard work we've done to minimize FPs. But, on the other hand, there are many great possibilities here for using this as a tool for evaluating URIs or as a honeypot for queuing URIs for evaluation where the URI wasn't already in SURBL.
Rob McEwen
On Wednesday, November 10, 2004, 5:25:43 AM, Rob McEwen wrote:
1st, if you are converting domains to IPs and then checking these IPs against spamhaus, you may have to make sure your system can whitelist the domains **before** conversion to IP since the IPs can change without notice.
Interesting. I was going to whitelist after detection. Whitelisting first would prevent some processing.
Note that we're not proposing making a list of IP addresses. The output is still mostly a list of domains.
2nd, SpamHaus keeps listing the following: msn.click-url.com, (& variations) (These show up FREQUENTLY in hams, so I'd Whitelist these up front. They seem to go in an out of SpamHaus intermittently.) FOR EXAMPLE: msn.click-url.com = 216.39.69.75 http://www.spamhaus.org/query/bl?ip=216.39.69.75 ...points to... http://www.spamhaus.org/sbl/sbl.lasso?query=SBL20705
click-url.com is already manually whitelisted so it would not be on our version of the lists. We would likely apply the SURBL whitelisting to these lists.
3rd, in fact, SpamHaus is going to list a lot of greymarketers that shouldn't be listed in SURBL (flowgo, euniverse, etc)
That is one area where we disagee with Spamhaus, and we've whitelisted most of those since they appear in legitimate newsletters, etc. However our whitelists of those domains may not be complete.
4th, most of the FPs I find in SpamHaus are XBL listings where the data source for that particular FP was http://cbl.abuseat.org/
CBL catches a LOT of spam... but it also periodically will list the mailserver for respected IPS where that ISP had one user who send out a bunch of spam and then CBL listed the IP address of that server. Unfortunately, this creates a lot of collateral damage. Recently, I experienced this with one of my clients's customer's BellSouth E-mail services. (I don't know the ratio of XBL stuff via CBL versus XBL stuff from other sources. I'd be curious to know this.)
Queries into our DNS servers almost never match domains that resolve into XBL, which makes sense since those are mostly zombies. However a domain list of XBL hits may be a useful early warning of spammers starting to use zombies for hosting, DNS, etc, which fortunately they haven't done much yet. In practical terms the XBL hits are so few now as to be a non-issue.
(Really I just included XBL for completeness; SBL is generally more relevant for URIs, which is why it's what's used by uridnsbl in SapmAssasisn by default. uridnsbl was probably designed with SBL in mind. If we do this, the SBL and XBL lists would be separate.)
Jeff, very likely, (I have a feeling) I've misunderstood your original intended use of SpamHaus? But maybe this information will be helpful anyway? I would definitely recommend NOT using the strategy I've described as an **automatic** way to get listed in SURBL. This would defeat MOST of the hard work we've done to minimize FPs. But, on the other hand, there are many great possibilities here for using this as a tool for evaluating URIs or as a honeypot for queuing URIs for evaluation where the URI wasn't already in SURBL.
Rob McEwen
The reason for looking at this is a way to avoid the DNS resolution on wild URI domains that urbdnsbl does in SA 3. This process is an approximation of what uridnsbl does with sbl. I suspect that uridnsbl gets some false positives similar to what you notice in your own processing. Presumably uridnsbl is scored lower than SURBLs because of the FPs, and a SURBL version of the sbl data should probably also be scored lower than other SURBLs for similar reasons. Our whitelists would tend to reduce the FP rate somewhat, if applied, which seems likely.
I share your concerns about FPs, but since we're doing something very similar to what uridnsbl does but with much less DNS overhead, the same concerns apply to FPs with uridnsbl, it's just that this new way of doing things would be much faster.
We have not turned this data into lists yet, but the reasons for considering it are as I describe: to bypass the very time consuming name resolution that urndnsbl does against domains in wild messages. It's meant to be a potential speedup/replacement for uridnsbl.
We should definitely discuss this more, and I'd like to hear from the SA developers.
Jeff C. -- "If it appears in hams, then don't list it."
on Wed, Nov 10, 2004 at 08:25:43AM -0500, Rob McEwen wrote:
CBL catches a LOT of spam... but it also periodically will list the mailserver for respected IPS where that ISP had one user who send out a bunch of spam and then CBL listed the IP address of that server.
IME, it's not so much spam as virus-infected machines. One reason I continue to use CBL is that it keeps out 40% of the virus traffic I'd see otherwise - that the infected machines are often used as spam proxies is icing on the cake. And anything that encourages slacker mail admins to /stop emitting or proxying viruses/ is a good thing in my book. So I don't see what your problem is.
As for the issue of listing domains and IPs of known spammer domains; I've been doing this (listing IPs of found spammer domains and checking unknown domains against the IP blacklist) for several months and it's worked pretty well. In a nutshell, the spammers change IPs more slowly than they change domains. It's a useful check. But you have to be careful to expire those IPs from time to time, as they're subsequently reassigned (whether to other spammers or to legit businesses).
On Wednesday, November 10, 2004, 7:29:32 AM, Steven Champeon wrote:
on Wed, Nov 10, 2004 at 08:25:43AM -0500, Rob McEwen wrote:
CBL catches a LOT of spam... but it also periodically will list the mailserver for respected IPS where that ISP had one user who send out a bunch of spam and then CBL listed the IP address of that server.
IME, it's not so much spam as virus-infected machines. One reason I continue to use CBL is that it keeps out 40% of the virus traffic I'd see otherwise - that the infected machines are often used as spam proxies is icing on the cake. And anything that encourages slacker mail admins to /stop emitting or proxying viruses/ is a good thing in my book. So I don't see what your problem is.
For some additional info, the current checks against XBL are for NS (name server), unqualified domain and www.domain.com currently. MX records are not being checked since they're less directly connected with URI domains.
Jeff C. -- "If it appears in hams, then don't list it."