For what it is worth, I have abandoned all script extractions of URLs. It is NOT reliable. The human eye is better, and almost as fast. Simply look at the email source and search for every instance of HTTP. Then, using your firing synapse and some of the jedi force powers, decide if what you are looking at is a legit link. Then report it. :-)
Then a team of squirrels on crack will take that submission and run it through numerous test of validity. Then it will be listed.
To go forwards, we had to go backwards.
--Chris
-----Original Message----- From: David Coulson [mailto:david@davidcoulson.net] Sent: Monday, May 17, 2004 8:08 PM To: discuss@lists.surbl.org Subject: [SURBL-Discuss] mbox parser
I've got a decent mailbox containing a variety of spam e-mail. Is there a nice little Perl script out there which will spit out the URLs so I can submit them to Bill's list?
David
-- David Coulson email: d@vidcoulson.com Linux Developer / web: http://davidcoulson.net/ Network Engineer phone: (216) 533-6967
Discuss mailing list Discuss@lists.surbl.org http://lists.surbl.org/mailman/listinfo/discuss
On Tue, May 18, 2004 at 10:58:15AM -0400, Chris Santerre wrote:
For what it is worth, I have abandoned all script extractions of URLs. It is NOT reliable. The human eye is better, and almost as fast. Simply look at the email source and search for every instance of HTTP. Then, using your firing synapse and some of the jedi force powers, decide if what you are looking at is a legit link. Then report it. :-)
Then a team of squirrels on crack will take that submission and run it through numerous test of validity. Then it will be listed.
To go forwards, we had to go backwards.
Ugh. You'll miss %xx encodings, Mime messages, and so forth. No way you can analyze 45,000 messages in any reasonable amount of time.
We collect bounce messages and analyze those in aggregate. Very simple and high volume stuff floats to the top.