At 11:43 19/04/2004, John Fawcett wrote:
Just wondering whether its a good idea putting so many highly specific workarounds in for current redirection techniques and sites ? Wouldn't it be better to try and handle most cases more generically ? Otherwise we're forever playing catchup with the spammers...
You're absolutely right. I am hoping that the seasoned SA and perl developers will come up with suitable code revisions for version 3 of Mail::SpamAssassin.
One suggestion (see http://bugzilla.spamassassin.org/show_bug.cgi?id=3261) was to use a configuration file parameter for redirection services, which looks promising in terms of future flexibility.
For the future revisions to the url parsing code, it's important to take into account our current knowledge of urls which are failing to be parsed. This is my main reason for summarizing them.
Fair enough. Hopefully someone can come up with a more generic way of processing it, as I think most cases of redirection can be handled fairly generically, if we're able to extract multiple URI's from one URL. Apart from % encoding, in all cases I've seen so far (yahoo, msn) the final URL is pretty much out in the clear.
I think it will take more than a bunch of regular expressions to handle it though, it will need a custom written algorithm with a little bit of intelligence which can make a few basic deductions based on what we know so far about the different techniques.
The more pressing task at the moment is to actually verify that the examples I collected as failing in version 2.63 really do not work with version 3, and then to open bug reports/RFEs so that they can be officially logged as open sa issues. At the moment only case 1 is open. Is anyone in a position to do this?
Not me unfortunately, since I run 2.63, but I would be able to test patches to the SpamCopURI plugin.
Also, it will be interesting to continue monitoring the characteristics of urls that are going undetected by sa and feed them back to the sa developer list.
I should collate a list of sample redirected URL's, I'll see if I can find any not already mentioned...
Regards, Simon