Hi All,
'Tis my first post to this list... so I'll try to make it a good one.
I have a heap (and will continue to have future heaps) of spam with URIs that don't hit any of the SURBLs. We'll hand-classify the URIs, of course, but are there any objections to scripting the submission against http://www.rulesemporium.com/cgi-bin/uribl.cgi?report=1 ?
The basic idea would be to run one automatic pass to analyze our SA headers to see if *_URI_BL rules already matched; if they don't, add the message to a processing queue. Then, for each URI in the each message of that queue, do the lookup again with Net::DNS, since the URI might have been added in the last 24 hours or so since the spam was received. This should hopefully produce a comparatively short list of URIs (and spams) that don't appear in the SURBL. From here, we can hand-classify this bunch (i.e., delete any that aren't spammer sites), and have our second-pass script strip the SA markup (if present) from the spam, and automatically submit the hand-picked URIs and their spams via the web interface.
Questions: 1) Is this approach reasonable (i.e., am I going to hear screams from someone if I script this, assuming I take precautions, rate-limit the submissions, and check the results before turning it loose?) 2) Is there already a more efficient way to submit URIs? (Besides running my own list, which, I guess, isn't too unreasonable :-) 3) Is there any advantage to submitting the same URI more than once (i.e., from different spam messages?) It seems like the answer is probably "no", but I'll gladly accept enlightenment. 4) Should I be submitting to multiple SURBLs, or just stick with ws.surbl.org?
Since implementing SURBLs in SA2.63 about a week ago, we've had amazing success. So much that we're having occasional word-wrap issues with the X-Spam-Level: (stars) header. :-)
Now I want to give something back.
- Ryan