Daryl C. W. O'Shea wrote:
On 9/22/2006 5:35 AM, opencomputing@gmail.com wrote:
Or should I look up the real target of the URL and look it up too? But this would lead to latencies of around 10-20 seconds depending on a 3rd party web server like geocities in the above example :(
You have to be careful in deciding what URLs found in mail are safe to query. Querying all URLs could have bad concequences, such as confirming a subsciption or unsubscription to a list, confirming a transaction that the intended recipient may or may not actually want confirmed or just plain verifying an address.
In any case, Geocities URLs are usually pretty safe to query. I usually strip query parameters from the URLs though.
As for then querying the URLs being redirected to via these web pages, I wouldn't bother. More than half are already javascript encoded, so unless you're looking to run javascript in a sandbox to get the URL you can't safely do it. That and it's not necessary. Content filtering of the web pages works very well. Yahoo! has been good about shutting new ones down lately too, so even just getting a 403 is a big spam sign. If you're using SpamAssassin, there's a plugin to do all this.
A year ago, when the Geocities problem was much bigger, I made a simple analysis tool.
I don't maintain it anymore, and data is partial, however, it shows that the javascript encoding used is nearly always very basic and easy to decode:
http://nospam.mailpeers.net/alive_spammy2.txt
Most of the listed sites appeared in spams long time ago and were not removed by Geocities.
The original site http://nospam.mailpeers.net/ contains .cf rules, but except for the generic ones it's not useful anymore, mostly since Yahoo / Geocities finally decided to do the right thing and block the new spam sites and I don't actively search geocities spamvertized links anymore.