[SURBL-Discuss] mbox parser
tdeppner at surewest.net
Tue May 18 09:24:31 CEST 2004
On Tue, May 18, 2004 at 10:58:15AM -0400, Chris Santerre wrote:
> For what it is worth, I have abandoned all script extractions of URLs. It is
> NOT reliable. The human eye is better, and almost as fast. Simply look at
> the email source and search for every instance of HTTP. Then, using your
> firing synapse and some of the jedi force powers, decide if what you are
> looking at is a legit link. Then report it. :-)
> Then a team of squirrels on crack will take that submission and run it
> through numerous test of validity. Then it will be listed.
> To go forwards, we had to go backwards.
Ugh. You'll miss %xx encodings, Mime messages, and so forth. No way you
can analyze 45,000 messages in any reasonable amount of time.
We collect bounce messages and analyze those in aggregate. Very simple
and high volume stuff floats to the top.
More information about the Discuss