[SURBL-Discuss] [long] summary of currently unparsed url
simon at igrin.co.nz
Mon Apr 19 12:54:24 CEST 2004
At 11:43 19/04/2004, John Fawcett wrote:
> > Just wondering whether its a good idea putting so many highly specific
> > workarounds in for current redirection techniques and sites ? Wouldn't it
> > be better to try and handle most cases more generically ? Otherwise we're
> > forever playing catchup with the spammers...
>You're absolutely right. I am hoping that the seasoned SA and
>perl developers will come up with suitable code revisions
>for version 3 of Mail::SpamAssassin.
>One suggestion (see http://bugzilla.spamassassin.org/show_bug.cgi?id=3261)
>was to use a configuration file parameter for redirection
>services, which looks promising in terms of future flexibility.
>For the future revisions to the url parsing code, it's important to take
>into account our current knowledge of urls which are failing to be
>parsed. This is my main reason for summarizing them.
Fair enough. Hopefully someone can come up with a more generic way of
processing it, as I think most cases of redirection can be handled fairly
generically, if we're able to extract multiple URI's from one URL. Apart
from % encoding, in all cases I've seen so far (yahoo, msn) the final URL
is pretty much out in the clear.
I think it will take more than a bunch of regular expressions to handle it
though, it will need a custom written algorithm with a little bit of
intelligence which can make a few basic deductions based on what we know so
far about the different techniques.
>The more pressing task at the moment is to actually
>verify that the examples I collected as failing in version 2.63
>really do not work with version 3, and then to open
>bug reports/RFEs so that they can be officially logged as
>open sa issues. At the moment only case 1 is open.
>Is anyone in a position to do this?
Not me unfortunately, since I run 2.63, but I would be able to test patches
to the SpamCopURI plugin.
>Also, it will be interesting to continue monitoring the
>characteristics of urls that are going undetected by
>sa and feed them back to the sa developer list.
I should collate a list of sample redirected URL's, I'll see if I can find
any not already mentioned...
More information about the Discuss