RE: [SURBL-Discuss] RE: (1) Another Possible FP, and (2) header p arsing issues

17 Aug 2004


      Chris Santerre wrote to 'SURBL Discussion list':
...
I'm thinking the best way would be to take the actual SURBL code, and use it
to rip out domains. But the SA code to unencode the email would be needed as
well. Putting this in the email pipe would be best.
If spam
   if not found in SURBL
   	run SURBL extract + append to file
Here's some sample output from the script I made a couple of months ago:
http://ry.ca/spam/results.html
It already does exactly what you describe, using the SA3 plugin,
ignoring any domains on the whitelists and bl[ao]cklists. It attempts to
display the results in a very readable way to assist with hand-checking.
The "Score" assigned is just a rough heuristic designed to separate
likely spammer domains from poisoning attempts.
It needs a little work... like detecting valid IP URIs
(http://24.0.0.1/) and checking them in reverse (1.0.0.24.*.surbl.org),
and some infrequent TLD chopping issues.
Even in its current form, it saves me *hours* of hand-checking. It also
supports local whitelists and blacklists, so I generally feed it about
500-1000 spams at a time, depending on how ambitious I feel, and go from
there. I make a few passes, first picking out the poisoning attempts
(most of those are easy to spot in the second list) and malformed URIs,
throwing those into the local whitelist. That usually weeds out over
half of the remaining URIs. I keep making passes in increasing order of
uncertainty until I'm left with about a dozen really icky ones that are
tough to classify. It works well, because by then I'm usually sick of
looking at URIs anyway, and usually those tough ones are best left
alone, to avoid FPs. :-)
I'm really glad I added the NANAS links. I also do this from a "safe"
browser so I can open up the message text (in the second list on the
page, but my site won't give you guys access to that :-)), or click on
the spammer URL to check out their site.
And, the way my mind works, I'll use this a few times and gradually add
more automation as I become simultaneously annoyed with the repetition,
and comfortable with my (previously human) algorithm for automation.
- Ryan
...
:)
--Chris
_______________________________________________
Discuss mailing list
Discuss@lists.surbl.org
http://lists.surbl.org/mailman/listinfo/discuss
-- 
   Ryan Thompson ryan@sasknow.com

   SaskNow Technologies - http://www.sasknow.com
   901-1st Avenue North - Saskatoon, SK - S7K 1Y4

         Tel: 306-664-3600   Fax: 306-244-7037   Saskatoon
   Toll-Free: 877-727-5669     (877-SASKNOW)     North America

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

RE: [SURBL-Discuss] RE: (1) Another Possible FP, and (2) header p arsing issues