[SURBL-Discuss] mbox parser
csanterre at merchantsoverseas.com
Wed May 19 11:05:22 CEST 2004
>From: Ted Deppner [mailto:tdeppner at surewest.net]
>Sent: Tuesday, May 18, 2004 4:53 PM
>To: discuss at lists.surbl.org
>Subject: Re: [SURBL-Discuss] mbox parser
>On Tue, May 18, 2004 at 10:58:15AM -0400, Chris Santerre wrote:
>> For what it is worth, I have abandoned all script
>extractions of URLs. It is
>> NOT reliable. The human eye is better, and almost as fast.
>Simply look at
>> the email source and search for every instance of HTTP.
>Then, using your
>> firing synapse and some of the jedi force powers, decide if
>what you are
>> looking at is a legit link. Then report it. :-)
>> Then a team of squirrels on crack will take that submission
>and run it
>> through numerous test of validity. Then it will be listed.
>> To go forwards, we had to go backwards.
>Ugh. You'll miss %xx encodings, Mime messages, and so forth.
I scan using outlook (don't ask!). So everything is decoded.
>No way you
>can analyze 45,000 messages in any reasonable amount of time.
Thanks to RBLs, and spammers realising they shouldn't send my company email
or they get listed to bigevil, my spam numbers are going down. Also I've
found that for every domain I add, there are 10 more spam with the same one
that can be deleted straight off. So it goes pretty fast, when I get my lazy
butt in gear to do it ;)
>We collect bounce messages and analyze those in aggregate. Very simple
>and high volume stuff floats to the top.
You are correct on that. I did have some great success with the scripts.
Mining the domains and working by hand might be ok, but I like to see the
whole email. So while time consuming, this is the best method for me to get
close to zero FPs.
Goes back to the old saying:
"Faster, better, and cheaper. Pick only 2."
More information about the Discuss