On Saturday, February 12, 2005, 2:41:36 AM, Alain Alain wrote:
- I've added a local skiplist with about top half of the public
"whitelist", no need to query those.
When you say half, that may be more than optimal (should be about 5000 records). SpamAssassin is using the top 125, which worked out to about the 50%th percentile of all whitelist hits when we first set this up. (Now that result is skewed *because* SpamAssassin isn't checking those 125 any more, but their snapshot of the 125 is still probably useful.
I'd say anything between 100 and 1000 would probably be a good compromise between list size and coverage.
The only disadvantage I see from a bigger local skiplist is some local CPU usage for every uri in a email. Most pc's have plenty of CPU power ;-) If this could become a problem, I can lower or optimise the local checking. Are there any other disadvantages?
One reason SpamAssassin didn't want to hard code too many domains into their local whitelist was in case we needed to withdraw any, i.e. because they started spamming. The time between code releases can be many months, and some people may never update, so they wanted to be sure to get very hammy domains into that list. (While Yahoo and Microsoft probably aren't going to start spamming any time soon, that may be less certain about some of the less commonly seen domains.)
But I'm glad that you're trying to minimize the DNS queries.
Jeff C. -- "If it appears in hams, then don't list it."
Hi Jeff
- I've added a local skiplist with about top half of the public
"whitelist", no need to query those.
When you say half, that may be more than optimal (should be about 5000 records). SpamAssassin is using the top 125, which worked out to about the 50%th percentile of all whitelist hits when we first set this up. (Now that result is skewed *because* SpamAssassin isn't checking those 125 any more, but their snapshot of the 125 is still probably useful.
I'd say anything between 100 and 1000 would probably be a good compromise between list size and coverage.
The only disadvantage I see from a bigger local skiplist is some local CPU usage for every uri in a email. Most pc's have plenty of CPU power ;-) If this could become a problem, I can lower or optimise the local checking. Are there any other disadvantages?
One reason SpamAssassin didn't want to hard code too many domains into their local whitelist was in case we needed to withdraw any, i.e. because they started spamming. The time between code releases can be many months, and some people may never update, so they wanted to be sure to get very hammy domains into that list. (While Yahoo and Microsoft probably aren't going to start spamming any time soon, that may be less certain about some of the less commonly seen domains.)
Worst case is that a spammer gets hold of a few domains inside the skiplist and starts using them. For that it's more usefull to know which domains have the least risk to get in the hands of spammers. This is maybe tricky.
But I'm glad that you're trying to minimize the DNS queries.
Well rather miss a few spams with the plugin than have surbl going away.
Alain
On Saturday, February 12, 2005, 4:25:30 AM, Alain wrote:
The only disadvantage I see from a bigger local skiplist is some local CPU usage for every uri in a email. Most pc's have plenty of CPU power ;-) If this could become a problem, I can lower or optimise the local checking. Are there any other disadvantages?
One reason SpamAssassin didn't want to hard code too many domains into their local whitelist was in case we needed to withdraw any, i.e. because they started spamming. The time between code releases can be many months, and some people may never update, so they wanted to be sure to get very hammy domains into that list. (While Yahoo and Microsoft probably aren't going to start spamming any time soon, that may be less certain about some of the less commonly seen domains.)
Worst case is that a spammer gets hold of a few domains inside the skiplist and starts using them. For that it's more usefull to know which domains have the least risk to get in the hands of spammers. This is maybe tricky.
One can argue that the more often used domains (e.g., Yahoo) are less likely to go away than the less frequently used ones. If so, then the less frequently appearing ones may be less appropriate for a hard-coded exclusion list.
But I'm glad that you're trying to minimize the DNS queries.
Well rather miss a few spams with the plugin than have surbl going away.
SURBLs are not going away, but we need to keep watch on performance issues, like any system.
Jeff C. -- "If it appears in hams, then don't list it."