[SURBL-Discuss] Auto adding detected "static 'front page' sites".
Ryan Thompson
ryan at sasknow.com
Mon Aug 23 12:20:03 CEST 2004
Christiaan den Besten wrote to 'Ryan Thompson':
>> Mail::SpamAssassin::PerMsgStatus::get_uri_list($status), but there
>> were a few other incantations that I did to get the list of URIs
>> down. I have been meaning to publish the script, but things keep
>> getting in the way. I will do that tomorrow (today). Stay tuned!
>
> Check, I see its 03:xx over there ;) Just woke up here :)
It's released.
http://ry.ca/geturi/
> I have just looked at Justin's hints for a SA plugin, that seems very doable
> as well. I was just wondering if I could re-use the SA surbl-plugin while I
> am at it. For I am only interested in uri's not yet in WS.
>
> For my idea, what you do now:
> - strip uri's from messages
Yes. I also attempt to eliminate those with empty anchors.
> - for each (new) uri generate a NASAS query
NANAS query URLs (to Google Groups) are pre-built, but not automatically
queried, because that would violate Google TOS. (See the TODO section in
the documentation).
> - build a 'matrix' between uri's and messages they are referenced in.
More or less, a two-way hash.
> - score uri's for spamability :)
Yep. Technically, they're just scored for relevance in the message. It's
up to the person building the corpus to decide whether they're spammy or
not. :-)
- Ryan
--
Ryan Thompson <ryan at sasknow.com>
SaskNow Technologies - http://www.sasknow.com
901-1st Avenue North - Saskatoon, SK - S7K 1Y4
Tel: 306-664-3600 Fax: 306-244-7037 Saskatoon
Toll-Free: 877-727-5669 (877-SASKNOW) North America
More information about the Discuss
mailing list