Christiaan den Besten wrote to 'Ryan Thompson':
Mail::SpamAssassin::PerMsgStatus::get_uri_list($status), but there were a few other incantations that I did to get the list of URIs down. I have been meaning to publish the script, but things keep getting in the way. I will do that tomorrow (today). Stay tuned!
Check, I see its 03:xx over there ;) Just woke up here :)
It's released.
I have just looked at Justin's hints for a SA plugin, that seems very doable as well. I was just wondering if I could re-use the SA surbl-plugin while I am at it. For I am only interested in uri's not yet in WS.
For my idea, what you do now:
- strip uri's from messages
Yes. I also attempt to eliminate those with empty anchors.
- for each (new) uri generate a NASAS query
NANAS query URLs (to Google Groups) are pre-built, but not automatically queried, because that would violate Google TOS. (See the TODO section in the documentation).
- build a 'matrix' between uri's and messages they are referenced in.
More or less, a two-way hash.
- score uri's for spamability :)
Yep. Technically, they're just scored for relevance in the message. It's up to the person building the corpus to decide whether they're spammy or not. :-)
- Ryan