New subject: RFC: Combined SURBL list details, phishing list ready

13 May 2004


      It's time to return to the question of a combined SURBL list
again mainly because David Hooton's anti-phishing list is now
ready.  The list is too small to be a separate list at a little
over a hundred entries, but it will probably grow.  So I'd like
to make it part of a combined list.
1.  We had discussed two strategies for a combined list A records before:
I.  Separate A records:
spammer.com  IN  A  127.0.0.1
                 IN  A  127.0.0.2
Where the two addresses indicate it being on two lists.
II.  Bitmasked address:
spammer.com  IN  A  127.0.0.3
where .3 means it's on the two lists corresponding to the .1 and
  .2 bits, and similarly for other lists in other bit positions.
In the first case resolution on spammer.com.combined-list-name-here.surbl.org
would give two separate Addresses 127.0.0.1 and 127.0.0.2 and in
the second case it would give one result 127.0.0.3.  The querying
program would need to act accordingly.  For the bitmasked case
some SA code would need to be written or reused.  As mentioned
earlier, various other RBLs combine lists using either of these
two strategies.  Here's an example cited earlier of the bitmask
style:
http://opm.blitzed.org/info
...
Using the DNSBL
...
In opm.blitzed.org, the A record has an IP address of 127.1.0.x
where x is a bitmask of the types of proxy that have been
reported to be running on the host. The values of the bitmask
are as follows:
WinGate       1
SOCKS         2
HTTP CONNECT  4
Router        8
HTTP POST     16
So the code using a combined list could be made to detect
specific results, i.e., the specific list which triggered 
a matching A record could be determined, and not just that it
matched "all" or any from the original list.  On the other hand,
the fact that matches from any list occurred may be good enough
for some users.  Personally I prefer more a detailed explanation
that would come from being able to distinguish the source list,
but that's a question of the program implementation and not the
combined list itself.  The combined list itself would always
encode the source list, whether the querying program knew or
cared how to decode that or not.
2.  Another question would be the name of the combined list.  Since
there would be three or more lists, someone had suggested a name
of "all" before.  That sounds good to me unless there are other
suggestions.
3.  I'm assuming TXT records are no longer really feasible in a
combined list and that descriptive messages will need to be
signalled by the list (127. address) matched.  I suppose it would
be possible to create custom TXT records for every entry, but a
generic TXT (or perhaps none) might be more likely.  Is a generic
TXT better than none?  Even in a BIND file, where it incurs some
use of space?
4.  TTLs: If an entry has matches on more than one list, should
it get a unique TTL?  If so, should such a custom TTL on the
multiply-matching entry be the longest TTL or the shortest TTL?
I lean towards the inheriting the shortest TTL from the matching
source list, plus setting a default TTL for the combined zone
file to be near the longest.
Am I right in thinking that TTLs are largely irrelevant for
rbldnsd, since it reloads zone info whenever the files change?
In other words, does the rbldnsd cache clear for a given zone
when the zone reloads, or do the cached entries with TTLs longer
than the last reload interval remain in the cache?  (I'm kind of
hoping for the simpler, case of rbldnsd clearing whenever
reloading.)
5.  We will likely want to combine the ws and be lists into a
single entry in a combined list, probably using the .1 bit for
both of them, since both lists contain the enumerated
(non-wildcarded) domains from SA regular expressions.  Also,
things are moving towards combining the non-wildcarded domains
sa-blacklist and BigEvil/MidEvil, so this would somewhat
short-circuit that process and future-proof things.
Comments?
Jeff C.

[SURBL-Discuss] RFC: Combined SURBL list details, phishing list ready