Jeff Chan wrote to SURBL Discussion list:
We could probably experiment and try some different approaches and see how they test out on corpora and live mail servers.
A simple join(1) on the data files might be a better start:
SC and AB and WS and JP and OB
Matches 202 records. That's going to have an extremely low detection rate. The problem is that "and" means "intersection", and by including ob in particular, you're automatically limiting the maximum size of the data to about 350 records.
((SC or AB) and (JP or OB))
Matches 1,187 records. Probably still too few.
or PH
Didn't feel like pulling PH out of multi for this test.
Better, IMHO, is to use something like
(SC + AB + JP + OB + WS) >= 3
Matches 16,560 records. Aha! Now we're getting something useful.
Without WS in that equation, the number drops to 906.
With UC and WS, the number rises to 18,964.
Other numbers, with SC + AB + JP + OB + WS + UC:
SC+AB+JP+OB+WS+UC # of records ----------------- ------------ 1 39,759 2 25,549 3 16,369 4 2,298 5 292 6 5
>= 2 44,513 superset of ... >= 3 18,964 ... >= 4 2,595 .. >= 5 297 .
You can try this with different lists if you want, or even mix in some judicious "and" and "or" matching. For instance, since there is a large overlap between jp and ws, you might want to choose one or the other. But maybe it doesn't matter so much, because, in that case, you might just set the cutoff lower to compensate, so having the additional list would still add some small bit of confidence.
To me, 3 currently looks like the likely sweet spot, although the hit rate on the ~2,500 domains present in four or more lists could still potentially put a sizeable dent in spam at the MTA level at a lower FP rate. I'd recommend looking at 3 and 4 a little more closely:
http://ry.ca/surbl/ab+jp+ob+sc+ws+uc3.txt http://ry.ca/surbl/ab+jp+ob+sc+ws+uc4.txt
By definition, 4 is a strict subset of 3, so if FP(n>=N) is the false positive rate of a list with domains in N-or-more lists, FP(n>=3) >= FP(n>=4). Thus, this approach also has the added benefit of allowing you to at least discretely control the FP rate somewhat.
Have fun! - Ryan