It's time to return to the question of a combined SURBL list again mainly because David Hooton's anti-phishing list is now ready. The list is too small to be a separate list at a little over a hundred entries, but it will probably grow. So I'd like to make it part of a combined list.
1. We had discussed two strategies for a combined list A records before:
I. Separate A records:
spammer.com IN A 127.0.0.1 IN A 127.0.0.2
Where the two addresses indicate it being on two lists.
II. Bitmasked address:
spammer.com IN A 127.0.0.3
where .3 means it's on the two lists corresponding to the .1 and .2 bits, and similarly for other lists in other bit positions.
In the first case resolution on spammer.com.combined-list-name-here.surbl.org would give two separate Addresses 127.0.0.1 and 127.0.0.2 and in the second case it would give one result 127.0.0.3. The querying program would need to act accordingly. For the bitmasked case some SA code would need to be written or reused. As mentioned earlier, various other RBLs combine lists using either of these two strategies. Here's an example cited earlier of the bitmask style:
Using the DNSBL
In opm.blitzed.org, the A record has an IP address of 127.1.0.x where x is a bitmask of the types of proxy that have been reported to be running on the host. The values of the bitmask are as follows:
WinGate 1 SOCKS 2 HTTP CONNECT 4 Router 8 HTTP POST 16
So the code using a combined list could be made to detect specific results, i.e., the specific list which triggered a matching A record could be determined, and not just that it matched "all" or any from the original list. On the other hand, the fact that matches from any list occurred may be good enough for some users. Personally I prefer more a detailed explanation that would come from being able to distinguish the source list, but that's a question of the program implementation and not the combined list itself. The combined list itself would always encode the source list, whether the querying program knew or cared how to decode that or not.
2. Another question would be the name of the combined list. Since there would be three or more lists, someone had suggested a name of "all" before. That sounds good to me unless there are other suggestions.
3. I'm assuming TXT records are no longer really feasible in a combined list and that descriptive messages will need to be signalled by the list (127. address) matched. I suppose it would be possible to create custom TXT records for every entry, but a generic TXT (or perhaps none) might be more likely. Is a generic TXT better than none? Even in a BIND file, where it incurs some use of space?
4. TTLs: If an entry has matches on more than one list, should it get a unique TTL? If so, should such a custom TTL on the multiply-matching entry be the longest TTL or the shortest TTL? I lean towards the inheriting the shortest TTL from the matching source list, plus setting a default TTL for the combined zone file to be near the longest.
Am I right in thinking that TTLs are largely irrelevant for rbldnsd, since it reloads zone info whenever the files change? In other words, does the rbldnsd cache clear for a given zone when the zone reloads, or do the cached entries with TTLs longer than the last reload interval remain in the cache? (I'm kind of hoping for the simpler, case of rbldnsd clearing whenever reloading.)
5. We will likely want to combine the ws and be lists into a single entry in a combined list, probably using the .1 bit for both of them, since both lists contain the enumerated (non-wildcarded) domains from SA regular expressions. Also, things are moving towards combining the non-wildcarded domains sa-blacklist and BigEvil/MidEvil, so this would somewhat short-circuit that process and future-proof things.
Comments?
Jeff C.