On Wednesday, September 15, 2004, 3:18:39 AM, Jeff Chan wrote:
On Wednesday, September 15, 2004, 1:13:57 AM, Joe Wein wrote:
Alex Broens wrote:
What is your expiration algorithm?
None so far. As its all very new, I'm still "collecting". On my local IP RBL I have 30 days, dunno yet what to use on URIs, I assume that using 6 months at least would be safe.
Suggestions?
In my spamfilter I keep a "referenced" list of domains: Every time I have a positive lookup and the domain is not in the referenced list, I add it.
Then every couple of weeks I rename the referenced list file and start from scratch. By intersecting these limited timeframe usage lists with the larger blacklist I can later verify which domains were actively spamvertised during what period.
So I could do someting like remove all entries added to the local blacklist list between January and April that were not advertised between May and August, for example.
If we had activity data for how many times the DNS server returned a hit for which domains on a given day, we could also work out such heuristics.
There is a rough count of SURBL DNS hits at:
It's only a sampling of about 32k queries over two days, so the source data is a little sparse, yet it can give some idea of some of the queries being hit.
And here's some perhaps more useful versions. Only the whitelist hits:
http://www.surbl.org/dns-queries.whitelist.counts.txt
And only the blocklist hits:
http://www.surbl.org/dns-queries.blocklist.counts.txt
If these are useful, we can increase the sampling to improve the quality of the data.
BTW does anyone see any obvious FPs in the blocklist hits? Or any pure spammers in the whitelist hits?
Jeff C.