Chris Santerre wrote to 'SURBL Discussion list':
I'm thinking the best way would be to take the actual SURBL code, and use it to rip out domains. But the SA code to unencode the email would be needed as well. Putting this in the email pipe would be best.
If spam if not found in SURBL run SURBL extract + append to file
Here's some sample output from the script I made a couple of months ago:
http://ry.ca/spam/results.html
It already does exactly what you describe, using the SA3 plugin, ignoring any domains on the whitelists and bl[ao]cklists. It attempts to display the results in a very readable way to assist with hand-checking. The "Score" assigned is just a rough heuristic designed to separate likely spammer domains from poisoning attempts.
It needs a little work... like detecting valid IP URIs (http://24.0.0.1/) and checking them in reverse (1.0.0.24.*.surbl.org), and some infrequent TLD chopping issues.
Even in its current form, it saves me *hours* of hand-checking. It also supports local whitelists and blacklists, so I generally feed it about 500-1000 spams at a time, depending on how ambitious I feel, and go from there. I make a few passes, first picking out the poisoning attempts (most of those are easy to spot in the second list) and malformed URIs, throwing those into the local whitelist. That usually weeds out over half of the remaining URIs. I keep making passes in increasing order of uncertainty until I'm left with about a dozen really icky ones that are tough to classify. It works well, because by then I'm usually sick of looking at URIs anyway, and usually those tough ones are best left alone, to avoid FPs. :-)
I'm really glad I added the NANAS links. I also do this from a "safe" browser so I can open up the message text (in the second list on the page, but my site won't give you guys access to that :-)), or click on the spammer URL to check out their site.
And, the way my mind works, I'll use this a few times and gradually add more automation as I become simultaneously annoyed with the repetition, and comfortable with my (previously human) algorithm for automation.
- Ryan
:)
--Chris _______________________________________________ Discuss mailing list Discuss@lists.surbl.org http://lists.surbl.org/mailman/listinfo/discuss