[SURBL-Discuss] probable impact of cid:.* urls in uri_to_domain
Eric Kolve
ekolve at comcast.net
Fri Apr 23 07:53:19 CEST 2004
On Fri, Apr 23, 2004 at 04:15:49PM +0800, Yusuf Goolamabbas wrote:
> Hi, Currently URIDNSBL.pm uses SA's get_uri_list to get a list of URI's
> from a message, the current regex seems to also get uri's of the form
> cid:random_characters in the list
>
> cid:.* seems to refer to content-ids,attachments in the same message
> when these uris are run through uri_to_domain, they return back the same
> result cid:.*
>
> My feeling is that a message can contain some artificial cid:.* url's
> which may skew the set of random domains used for SURBL lookup's
>
> I am not sure if cid:.* url's should be returned from get_uri_list() or
> they should be stripped correctly in uri_to_domain. Quite a few of the
> values after cid: seem to refer to host names/domain names
I did a quick test and cid:.* urls are not checked against SURBL
in SpamCopURI.
I use URI to do all the URI parsing and then check to see if it
has a host method, which only schemes such as http, ftp, gopher, etc.
actually implement. The cid scheme translates to an internal _foreign
URI type, which has no host implementation.
--eric
>
> Regards, Yusuf
> _______________________________________________
> Discuss mailing list
> Discuss at lists.surbl.org
> http://lists.surbl.org/mailman/listinfo/discuss
More information about the Discuss
mailing list