On Thursday, September 9, 2004, 4:22:18 PM, Pete McNeil wrote:
On Thursday, September 9, 2004, 6:22:39 PM, Scott wrote:
SAC>> How does this sound? Combine spamtraps with SURBL, using the IP as a SAC>> hint to fully automatically add on the new domain. If a spamtrap email SAC>> includes a URL that resolves to a server that has the same IP as SAC>> another server already on the SURBL blacklist, automatically and SAC>> immediately add the new domain to SURBL. One could also use shared DNS SAC>> servers as a similar hint. If a new domain in a spamtrap shares a DNS SAC>> server with an already listed domain, add it to SURBL automatically.
I saw this passing by. Please don't do this. We are using SURBL as a research tool and we see too many false positives for this approach. Any time an FP domain is targeting a virtual web server you will run the risk of expanding that problem to reference all other web sites on that server. Don't get me wrong, it's a good idea (we use a similar mechanism internally to recurse through our domain lists) however we have discovered that the data must be _extremely clean_ before allowing ip reference domain recusion.
My first pass at cleaning the resolved IP data would be to take the to 70th percentile of IP addresses and only use those to check domain resolved IPs to. It's not perfect, but it should cut down on the uncertainty.
SAC>> We should be a bit more careful than this --- require that a new URL SAC>> has to resolve to the same IP address as, say, at least 3 other SURBL SAC>> entries before being automatically added on. Also, there should also SAC>> be a list of IP's for which this automatic logic won't be SAC>> triggered. This would be important for a poorly run but popular SAC>> virtual server that's slow at kicking off spamvertized sites.
You've hit upon another hazard. Requiring 3 other SURBL domains is a good step - a better one is to require a certain age for a record... That is, if the record has been in place for long enough that a FP report would have easily knocked it out then you will probably be safe. The FPs that I'm catching in SURBL are usually reported very quickly - they don't go long without being noticed. If you wait 10 days or so you will be about 75% safe (off the top of my head).
Age cuts both ways. If we wait 10 days, the utility of the domain for some spammers may have gone away. I have statistics that show spammers use domains for less than 3 days on average.
I'm still tuning our AI so I can only tell you that you are on the right track and that you will want to watch the rates at which things are added and the FP rates and character - then tweak the rules you use to keep this process clean. When I started using this approach I thought I had an idea what would work - and I was more wrong than right until about the 3rd round of adjustments.
Would you care to share some of your strategies, perhaps off list?
Jeff C.