[SURBL-Discuss] Most often "hit" SURBL domains

Andy Warner andy at andy.net
Thu Sep 30 05:07:00 CEST 2004


On Wed, 29 Sep 2004, Jeff Chan wrote:

> On Wednesday, September 29, 2004, 6:03:32 PM, Rob McEwen wrote:
>
> > However, I still think my idea may be more efficient overall because some
> > non-hits will be checked with each message sometimes before the "guilty" URI
> > is found. If a spammer purposely adds a variety of non-spams, either through
> > purposeful poisoning, or incidentally via other typical obfuscation or
> > "mixing it up" techniques... then this could mean a large variety of URIs
> > looked up which could have been avoided??
>
> > Or, is this scenario I describe far fetched and not representative of what
> > would actually happen?
>
> > Rob McEwen
>
> Most of the professional spams I've seen lately seem to have only
> the spammer's own domain in them.
>
> Non-hits in general would be a potential problem given that most
> domains occurring in messages are neither on our white nor block
> lists.  But fortunately the negative caching function of DNS will
> cache the non-hits just as positive caching caches the hits.
>
> The downside is that those are subject to the negative caching
> TTL, so they will get re-queried against an authoritative server
> after the negative caching interval passes for a given record.
>
> We have already tuned the positive and negative caching TTLs
> experimentally to 15 minutes.  This is a value that appears to
> optimize both name server traffic and latency of records entering
> and leaving the subdomains (i.e. the lists).  It may not be
> useful to tune these further, especially since the local
> whitelist function will help a lot with a large chunk of the
> most common negative caching of yahoo.com, w3.org, etc.
>
> Jeff C.
> --

If somebody wants to try a quick and dirty proof of concept on this
without using the name servers and actual query volume I'm guessing
the AbuseButler volume information correlates quite closely to the
DNS query volume for most domains. (say in the neighborhood of .8
to .9 correl. if I had to guess). I can produce data in just about
any format you'd like as all my data in in Postgres - converting a
daily volume report into a local BL would be fairly trivial for a
quick proof of concept.

e.g.:  http://spamvertised.abusebutler.com/spamvertised.php?rep=last24

--
Andy



More information about the Discuss mailing list