[SURBL-Discuss] Auto adding detected "static 'front page' sites".

Ryan Thompson ryan at sasknow.com
Mon Aug 23 12:20:03 CEST 2004


Christiaan den Besten wrote to 'Ryan Thompson':

>> Mail::SpamAssassin::PerMsgStatus::get_uri_list($status), but there
>> were a few other incantations that I did to get the list of URIs
>> down. I have been meaning to publish the script, but things keep
>> getting in the way.  I will do that tomorrow (today). Stay tuned!
>
> Check, I see its 03:xx over there ;) Just woke up here :)

It's released.

http://ry.ca/geturi/

> I have just looked at Justin's hints for a SA plugin, that seems very doable
> as well. I was just wondering if I could re-use the SA surbl-plugin while I
> am at it. For I am only interested in uri's not yet in WS.
>
> For my idea, what you do now:
> - strip uri's from messages

Yes. I also attempt to eliminate those with empty anchors.

> - for each (new) uri generate a NASAS query

NANAS query URLs (to Google Groups) are pre-built, but not automatically
queried, because that would violate Google TOS. (See the TODO section in
the documentation).

> - build a 'matrix' between uri's and messages they are referenced in.

More or less, a two-way hash.

> - score uri's for spamability :)

Yep. Technically, they're just scored for relevance in the message. It's
up to the person building the corpus to decide whether they're spammy or
not. :-)

- Ryan

-- 
   Ryan Thompson <ryan at sasknow.com>

   SaskNow Technologies - http://www.sasknow.com
   901-1st Avenue North - Saskatoon, SK - S7K 1Y4

         Tel: 306-664-3600   Fax: 306-244-7037   Saskatoon
   Toll-Free: 877-727-5669     (877-SASKNOW)     North America


More information about the Discuss mailing list