Chris Santerre csanterre@MerchantsOverseas.com writes:
Hey there guys! This was the crazy idea I was discussing with Doc. I wished for a realtime form of DB or flat file to be updated continuously on rule hits. No grepping thru logs or anything. Simply when an email is sent thru SA, whatever rules hit, increase a counter in a db or flat file for that rule. Seperate db or flat file for ham and spam. This gives live stats on a system. No grep'n going on. Just a counter per rule.
That might give you a good hit rate, but it won't give you an accurate S/O number.
This is to be used on some advanced rule writing we want to work on. It also alows an admin to see what might not be worth keeping around. Allowing them to remove poor performers and increase system speed.
Sort of like http://www.pathname.com/~corpus/DETAILS.new ?
The corpora have to be sorted by humans to be accurate and runs need to be synchronized so everyone tests the same rules so runs only happen once a day, which is fast enough.
We've been doing this for well over a year and it works great. If only we had more active developers working on rules...
Daniel
Is there any way we can get the message body URI domains (or even raw messages) from some of the ham lists?
The intent is not to "cheat" the S/O scores, but to compile a good, hand checked whitelist of legitimate message body domains. *I don't really care where they come from*, but we have this nice, untapped source of them just sitting there, and a glaring need for some legitimate domains to keep off the blocklists....
I'm results oriented. If there are some good hand-checked ham lists, I see no practical reason why we should not use them to generate message body domain whitelists.
Does anyone have any ham to share? :-)
Jeff C.