Ryan Thompson has written some code to extract URIs from messages, using SpamAssassin 3.0 code. It could be useful for spam fighters in general, and in particular those feeding URI domains and IP addresses into URI blocklists such as SURBLs.
Jeff C.
---------- Forwarded message ---------- Date: Mon, 23 Aug 2004 11:13:46 -0600 (CST) From: Ryan Thompson To: SURBL Discussion list Subject: Geturi 1.4 released! (The SURBL URI classification helper)
Finally. I've been taunting you poor folks for weeks, now. :-) Here it is:
http://ry.ca/geturi/ -- geturi v1.4
From the DESCRIPTION:
geturi is designed to process a directory containing a list of RFC822 messages (one message per file). It analyses each message, attempts to strip out as many unclickable URIs as possible, and then compiles the list of found URIs, putting HTML output on STDOUT.
What I'd *like* to see are a bunch of people using this, and some suggestions for improvement (I already have quite a few, some of which are in the TODO section of the documentation). I'd call this alpha code at the moment, for want of testers, but I don't know of any huge bugs.
Feedback more than welcome!
- Ryan