On Wed, 29 Sep 2004, Jeff Chan wrote:
On Wednesday, September 29, 2004, 6:03:32 PM, Rob McEwen wrote:
However, I still think my idea may be more efficient overall because some non-hits will be checked with each message sometimes before the "guilty" URI is found. If a spammer purposely adds a variety of non-spams, either through purposeful poisoning, or incidentally via other typical obfuscation or "mixing it up" techniques... then this could mean a large variety of URIs looked up which could have been avoided??
Or, is this scenario I describe far fetched and not representative of what would actually happen?
Rob McEwen
Most of the professional spams I've seen lately seem to have only the spammer's own domain in them.
Non-hits in general would be a potential problem given that most domains occurring in messages are neither on our white nor block lists. But fortunately the negative caching function of DNS will cache the non-hits just as positive caching caches the hits.
The downside is that those are subject to the negative caching TTL, so they will get re-queried against an authoritative server after the negative caching interval passes for a given record.
We have already tuned the positive and negative caching TTLs experimentally to 15 minutes. This is a value that appears to optimize both name server traffic and latency of records entering and leaving the subdomains (i.e. the lists). It may not be useful to tune these further, especially since the local whitelist function will help a lot with a large chunk of the most common negative caching of yahoo.com, w3.org, etc.
Jeff C.
If somebody wants to try a quick and dirty proof of concept on this without using the name servers and actual query volume I'm guessing the AbuseButler volume information correlates quite closely to the DNS query volume for most domains. (say in the neighborhood of .8 to .9 correl. if I had to guess). I can produce data in just about any format you'd like as all my data in in Postgres - converting a daily volume report into a local BL would be fairly trivial for a quick proof of concept.
e.g.: http://spamvertised.abusebutler.com/spamvertised.php?rep=last24
-- Andy