Ryan Thompson wrote to SURBL Discussion list:
XBL is an excellent list of spam senders, by far the biggest catcher of spam senders in my regular RBLs, so it probably would be good as a header check for GetURI also. Ryan can we make this a feature request?
Sure. Now it's making sense. :-) Fortunately, adding header checks will be easy, because I'm already using the SpamAssassin engine.
OK, I've tried this, but it slows down the runs considerably, and my 2K test corpus had 54 RCVD_IN_XBL hits, but for some reason, *none* of those messages contained domains that were not already listed in SURBL. The run took 26 minutes, instead of the usual 2-3m for the 2K corpus.
Then, I used the new --surbl=hostname option to only check against WS only (instead of the default multi), and found only 2/381 (0.5%) domains spamvertised by an XBL listed host.
Hmm. Then I fed the --surbl option a local "dummy" SURBL list containing only test entries, effectively disabling the SURBL filter in GetURI, and have 52/3130 (1.6%) domains whose message was RCVD_IN_XBL.
So, I think, given the low hit rate (especially in the usual case of only looking for new SURBL domains), and the tremendous amount of extra time required to do the XBL header/net test (the last run took 48 minutes, compared to ~16 minutes without the header tests), so I'm going to make GetURI default to *not* doing the header checks, and let people enable them with the new --header option.
With all of these new DNS tests, network delays are now definitely the bottleneck in GetURI. Soon (not for 1.6, maybe 1.7), I think I'm going to have to go to a forked or threaded model.
- Ryan