At 09:49 22/04/2004, you wrote:
- BigEvil wildcards. Not sure how you would handle these.
Something like
evil\d{2,4}spam.com is a general wildcard. Some of those
domains don't even
exhist. Not sure how SURBL will handle that.
Yes, I should have mentioned that I'm simply discarding them. Unfortunately there's no easy way to deal with them. Domains without any patterns in them, which are a majority, come right through. The script is at:
Can we make sure that when you announce this to the public that they know this! :) I can see the flurry of emails now.
Right near the top of http://spamcheck.freeapp.net/bigevil.domains.afterwhitelist there is 123-ebiz - is that a mistake or parsing error ?
But frankly I like the fact that there is some overlap in the lists. In a sense that represents multiple reporting; i.e. a domain in more than one list is more likely a bad guy. I don't think we should lose that coding.
YMMV, but I'd say keep any overlap in BE. It's a feature not a bug.
I think so too. What some people suggesting merging are forgetting, is with lists with totally different sources, that whether a given URL is listed in one two or three of the lists IS an extra piece of information, something listed in all three is more likely to be correct than one listed on only one of the lists.
The SA approach of assigning a score to each list based on it's relative merits, and the scores ADDING if they're in multiple lists seems to be a sensible approach to me...
Of course there is nothing to stop you having merged lists available AS WELL for those that are willing to take the risk of one higher scoring merged list...with choice, everyone is happy ;)
By the way, am I jumping the gun here or is be.surbl.org ready to go, or should I wait a bit ? :)
Regards, Simon