On Thursday, September 9, 2004, 6:22:39 PM, Scott wrote:
SAC> On Thu, 9 Sep 2004 16:56:33 -0400, Chris Santerre SAC> csanterre@MerchantsOverseas.com writes:
OK, this isn't the first time we've had this discussion, but Raymond and I felt this should be made public again. He ran thru some tests of 1500+ domains and found the following data. Looks like they maybe send from zombies, and never their hosts. IPs are similar across the board.
So is there a way to use the IP info in a good way? Could SA or SURBL do a quick ping of the URL and match against a URL? This would allow us to simply list 1 IP instead of all these domains.
(I'm well aware of virtual hosts! So only the filthiest of spammers would be put on this IP list. Then their IP better boot them or anyone hosted on that box would feel the rath of SURBL.)
SAC> How does this sound? Combine spamtraps with SURBL, using the IP as a SAC> hint to fully automatically add on the new domain. If a spamtrap email SAC> includes a URL that resolves to a server that has the same IP as SAC> another server already on the SURBL blacklist, automatically and SAC> immediately add the new domain to SURBL. One could also use shared DNS SAC> servers as a similar hint. If a new domain in a spamtrap shares a DNS SAC> server with an already listed domain, add it to SURBL automatically.
I saw this passing by. Please don't do this. We are using SURBL as a research tool and we see too many false positives for this approach. Any time an FP domain is targeting a virtual web server you will run the risk of expanding that problem to reference all other web sites on that server. Don't get me wrong, it's a good idea (we use a similar mechanism internally to recurse through our domain lists) however we have discovered that the data must be _extremely clean_ before allowing ip reference domain recusion.
SAC> We should be a bit more careful than this --- require that a new URL SAC> has to resolve to the same IP address as, say, at least 3 other SURBL SAC> entries before being automatically added on. Also, there should also SAC> be a list of IP's for which this automatic logic won't be SAC> triggered. This would be important for a poorly run but popular SAC> virtual server that's slow at kicking off spamvertized sites.
You've hit upon another hazard. Requiring 3 other SURBL domains is a good step - a better one is to require a certain age for a record... That is, if the record has been in place for long enough that a FP report would have easily knocked it out then you will probably be safe. The FPs that I'm catching in SURBL are usually reported very quickly - they don't go long without being noticed. If you wait 10 days or so you will be about 75% safe (off the top of my head).
I'm still tuning our AI so I can only tell you that you are on the right track and that you will want to watch the rates at which things are added and the FP rates and character - then tweak the rules you use to keep this process clean. When I started using this approach I thought I had an idea what would work - and I was more wrong than right until about the 3rd round of adjustments.
My $0.02 _M
Pete McNeil (Madscientist) President, MicroNeil Research Corporation Chief SortMonster (www.sortmonster.com)