-----Original Message----- From: Ted Deppner [mailto:tdeppner@surewest.net] Sent: Tuesday, May 18, 2004 4:53 PM To: discuss@lists.surbl.org Subject: Re: [SURBL-Discuss] mbox parser
On Tue, May 18, 2004 at 10:58:15AM -0400, Chris Santerre wrote:
For what it is worth, I have abandoned all script
extractions of URLs. It is
NOT reliable. The human eye is better, and almost as fast.
Simply look at
the email source and search for every instance of HTTP.
Then, using your
firing synapse and some of the jedi force powers, decide if
what you are
looking at is a legit link. Then report it. :-)
Then a team of squirrels on crack will take that submission
and run it
through numerous test of validity. Then it will be listed.
To go forwards, we had to go backwards.
Ugh. You'll miss %xx encodings, Mime messages, and so forth.
I scan using outlook (don't ask!). So everything is decoded.
No way you can analyze 45,000 messages in any reasonable amount of time.
Thanks to RBLs, and spammers realising they shouldn't send my company email or they get listed to bigevil, my spam numbers are going down. Also I've found that for every domain I add, there are 10 more spam with the same one that can be deleted straight off. So it goes pretty fast, when I get my lazy butt in gear to do it ;)
We collect bounce messages and analyze those in aggregate. Very simple and high volume stuff floats to the top.
You are correct on that. I did have some great success with the scripts. Mining the domains and working by hand might be ok, but I like to see the whole email. So while time consuming, this is the best method for me to get close to zero FPs.
Goes back to the old saying: "Faster, better, and cheaper. Pick only 2."
--Chris