I've got a decent mailbox containing a variety of spam e-mail. Is there a nice little Perl script out there which will spit out the URLs so I can submit them to Bill's list?
David
On Monday, May 17, 2004, 5:07:32 PM, David Coulson wrote:
I've got a decent mailbox containing a variety of spam e-mail. Is there a nice little Perl script out there which will spit out the URLs so I can submit them to Bill's list?
Hi David, Someone else asked about this recently, saying he could not find a good message body URI parser. Presumably the reason is that it's a little more complicated than it may seem at first, given the need to decode MIME, weird cases, etc. I suggested starting with some of the code form SpamCopURI or urirhsbl from the SA 3.0 URIBL module.
Jeff C.
On Mon, May 17, 2004 at 05:19:39PM -0700, Jeff Chan wrote:
On Monday, May 17, 2004, 5:07:32 PM, David Coulson wrote:
I've got a decent mailbox containing a variety of spam e-mail. Is there a nice little Perl script out there which will spit out the URLs so I can submit them to Bill's list?
Someone else asked about this recently, saying he could not find a good message body URI parser. Presumably the reason is that it's a little more complicated than it may seem at first, given the need to decode MIME, weird cases, etc. I suggested starting with some of the code form SpamCopURI or urirhsbl from the SA 3.0 URIBL module.
I wrote one for my company's use, but I'm not certain of my ability to release it publically... have to check in on that.
A short list of the necessary decodes:
- Mime - UUencode - undo MIME wordwraps (/=$/) - URL %HH encoding - HTML #decimal; encoding - HTML #0xhexa; encoding
Note, it's a good idea to parse for URL like things before and after MIME and UUencode decodings.
We wrote a generic message parser that our perl SMTPD replacement uses, and the command line tools run against the same code. This means we're always consistent in our interpretation of URLs. This becomes necessary when you're also matching for "traditional" URIs like stock symbols and phone numbers.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
SpamAssassin 3.0.0's "mass-check --loguris" works quite well ;)
- --j.
Jeff Chan writes:
On Monday, May 17, 2004, 5:07:32 PM, David Coulson wrote:
I've got a decent mailbox containing a variety of spam e-mail. Is there a nice little Perl script out there which will spit out the URLs so I can submit them to Bill's list?
Hi David, Someone else asked about this recently, saying he could not find a good message body URI parser. Presumably the reason is that it's a little more complicated than it may seem at first, given the need to decode MIME, weird cases, etc. I suggested starting with some of the code form SpamCopURI or urirhsbl from the SA 3.0 URIBL module.
Jeff C.
Discuss mailing list Discuss@lists.surbl.org http://lists.surbl.org/mailman/listinfo/discuss
Yes, but you can be very careful with extracted URIs. Many spams come with poisonned URIs. Some of them are good sites (I even found unesco.org in a porn spam). Some of them come with hundreds of random generated URIs and others come with nonexistent URIs, usually on the unsubscribe web page.
Justin Mason wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
SpamAssassin 3.0.0's "mass-check --loguris" works quite well ;)
- --j.
Jeff Chan writes:
On Monday, May 17, 2004, 5:07:32 PM, David Coulson wrote:
I've got a decent mailbox containing a variety of spam e-mail. Is there a nice little Perl script out there which will spit out the URLs so I can submit them to Bill's list?
Hi David, Someone else asked about this recently, saying he could not find a good message body URI parser. Presumably the reason is that it's a little more complicated than it may seem at first, given the need to decode MIME, weird cases, etc. I suggested starting with some of the code form SpamCopURI or urirhsbl from the SA 3.0 URIBL module.
Jeff C.
Discuss mailing list Discuss@lists.surbl.org http://lists.surbl.org/mailman/listinfo/discuss
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS
iD8DBQFAqdWBQTcbUG5Y7woRAi3FAKC/MOpFwDjuCWrzpGESCr4NbQy4wACfeTcX PmBsVtbPuxa56CTiN8BhrSI= =dJjR -----END PGP SIGNATURE-----
Discuss mailing list Discuss@lists.surbl.org http://lists.surbl.org/mailman/listinfo/discuss