[SURBL-Discuss] [long] summary of currently unparsed url types

Simon Byrnand simon at igrin.co.nz
Mon Apr 19 12:54:24 CEST 2004

At 11:43 19/04/2004, John Fawcett wrote:

> > Just wondering whether its a good idea putting so many highly specific
> > workarounds in for current redirection techniques and sites ? Wouldn't it
> > be better to try and handle most cases more generically ? Otherwise we're
> > forever playing catchup with the spammers...
>You're absolutely right. I am hoping that the seasoned SA and
>perl developers will come up with suitable code revisions
>for version 3 of Mail::SpamAssassin.
>One suggestion (see http://bugzilla.spamassassin.org/show_bug.cgi?id=3261)
>was to use a configuration file parameter for redirection
>services, which looks promising in terms of future flexibility.
>For the future revisions to the url parsing code, it's important to take
>into account our current knowledge of urls which are failing to be
>parsed. This is my main reason for summarizing them.

Fair enough. Hopefully someone can come up with a more generic way of 
processing it, as I think most cases of redirection can be handled fairly 
generically, if we're able to extract multiple URI's from one URL. Apart 
from % encoding, in all cases I've seen so far (yahoo, msn) the final URL 
is pretty much out in the clear.

I think it will take more than a bunch of regular expressions to handle it 
though, it will need a custom written algorithm with a little bit of 
intelligence which can make a few basic deductions based on what we know so 
far about the different techniques.

>The more pressing task at the moment is to actually
>verify that the examples I collected as failing in version 2.63
>really do not work with version 3, and then to open
>bug reports/RFEs so that they can be officially logged as
>open sa issues. At the moment only case 1 is open.
>Is anyone in a position to do this?

Not me unfortunately, since I run 2.63, but I would be able to test patches 
to the SpamCopURI plugin.

>Also, it will be interesting to continue monitoring the
>characteristics of urls that are going undetected by
>sa and feed them back to the sa developer list.

I should collate a list of sample redirected URL's, I'll see if I can find 
any not already mentioned...


More information about the Discuss mailing list