[SURBL-Discuss] RE: (1) Another Possible FP, and (2) header p arsing issues

Ryan Thompson ryan at sasknow.com
Mon Aug 16 19:45:57 CEST 2004


Chris Santerre wrote to 'SURBL Discussion list':

> I'm thinking the best way would be to take the actual SURBL code, and use it
> to rip out domains. But the SA code to unencode the email would be needed as
> well. Putting this in the email pipe would be best.
>
> If spam
> 	if not found in SURBL
> 		run SURBL extract + append to file

Here's some sample output from the script I made a couple of months ago:

http://ry.ca/spam/results.html

It already does exactly what you describe, using the SA3 plugin,
ignoring any domains on the whitelists and bl[ao]cklists. It attempts to
display the results in a very readable way to assist with hand-checking.
The "Score" assigned is just a rough heuristic designed to separate
likely spammer domains from poisoning attempts.

It needs a little work... like detecting valid IP URIs
(http://24.0.0.1/) and checking them in reverse (1.0.0.24.*.surbl.org),
and some infrequent TLD chopping issues.

Even in its current form, it saves me *hours* of hand-checking. It also
supports local whitelists and blacklists, so I generally feed it about
500-1000 spams at a time, depending on how ambitious I feel, and go from
there. I make a few passes, first picking out the poisoning attempts
(most of those are easy to spot in the second list) and malformed URIs,
throwing those into the local whitelist. That usually weeds out over
half of the remaining URIs. I keep making passes in increasing order of
uncertainty until I'm left with about a dozen really icky ones that are
tough to classify. It works well, because by then I'm usually sick of
looking at URIs anyway, and usually those tough ones are best left
alone, to avoid FPs. :-)

I'm really glad I added the NANAS links. I also do this from a "safe"
browser so I can open up the message text (in the second list on the
page, but my site won't give you guys access to that :-)), or click on
the spammer URL to check out their site.

And, the way my mind works, I'll use this a few times and gradually add
more automation as I become simultaneously annoyed with the repetition,
and comfortable with my (previously human) algorithm for automation.

- Ryan

>
> :)
>
> --Chris
> _______________________________________________
> Discuss mailing list
> Discuss at lists.surbl.org
> http://lists.surbl.org/mailman/listinfo/discuss
>

-- 
   Ryan Thompson <ryan at sasknow.com>

   SaskNow Technologies - http://www.sasknow.com
   901-1st Avenue North - Saskatoon, SK - S7K 1Y4

         Tel: 306-664-3600   Fax: 306-244-7037   Saskatoon
   Toll-Free: 877-727-5669     (877-SASKNOW)     North America


More information about the Discuss mailing list