On Friday, November 12, 2004, 4:00:47 AM, David Hooton wrote:
On Fri, 12 Nov 2004 02:45:15 -0800, Jeff Chan jeffc@surbl.org wrote:
Pondering the question of how to make a "telco grade" SURBL that had as close to zero false positives as possible, but would still catche many spams, I remembered that many of the biggest spam domains seem to appear in several different SURBL lists.
What does anyone think about creating a "consensus" list that a telco or ISP might use to block at the MTA level?
For example a domain that appears on:
((SC or AB) and (JP or OB)) or PH
I think the percentile based lists are probably the best way to go - ie. top 50% of all requested surbl listed domains or something like that?
Percentiles are good, but they're only possible when you have frequencies of reports, queries, etc. The only list I have report frequencies for is SC, so it's not possible for me to compare percentiles across other lists.
One thing we could take percentiles on is DNS queries, and that could be useful, but it doesn't exclude FPs. If we didn't whitelist w3.org for example, it would have lots of DNS query FPs. Frequencies of DNS query hits against blocklists could get us an approximation of the "top spammers" with some possible FPs included among the most frequent queries.
We should probably work on developing some more diverse spamtrap feeds. Quite a lot of ISP's have well established spamtraps that they are either not using or are completely underutilising.
Lists like SC, AB and JP all seem to be good data sources, but if you were trying to be certain of 0 FP's you'd need something to reliably and continuously rebuild your data against and from.
More traps and more data are definitely desirable, but we're also interested in seeing if we can make smarter use of the existing data, so thanks for your suggestions.
Jeff C. -- "If it appears in hams, then don't list it."