Given the probable need to improve whitelisting, I've added a log of domains that would go onto sc.surbl.org but are then prevented from getting onto the list by the whitelist(s):
http://www.surbl.org/whitelist-hits.new.log
That goes along with the log of new additions to sc.surbl.org, i.e., essentially a blacklisting log:
http://www.surbl.org/top-sites-domains.new.log
I've also grabbed copy of 500 popular web site domains for addition to the whitelist. A couple of the recent whitelist hits have been from it. So far they seem reasonable.
Whitelisting will continue in the next version of the engine, hopefully with some larger data sets.
Blacklisting based on SpamCop URI domain data will hopefully be more stable and broader in the next version also. In other words, there should be significantly less activity on the blacklist log since the list itself will be more stable. (For example under the current system you may see some domains that come off the list then get back on it.... Pay no attention to the man behind the curtain... :-) There should be a lot less of that.)
Jeff C.
On Wednesday, April 14, 2004, 6:27:43 PM, Jeff Chan wrote:
Given the probable need to improve whitelisting, I've added a log of domains that would go onto sc.surbl.org but are then prevented from getting onto the list by the whitelist(s):
I should have added that it's also possible and perhaps likely for the same whitelisted domain to appear more than once in the log since the data is currently fairly short-lived given the 4 day expiration. The new engine will probably have a base 10 day expiration and have some other longer memories.
Multiple whitelist hits are essentially a non-problem because it means those domains are (being mis-reported but) not getting onto the list.
Multiple blacklist appearances by the same domain are suboptimal however, since it means probable spam domains could get missed until the reporting threshold to trigger list inclusion is reached again. That will be addressed....
Jeff C.