[SURBL-Discuss] probable impact of cid:.* urls in uri_to_domain

Eric Kolve ekolve at comcast.net
Fri Apr 23 07:53:19 CEST 2004


On Fri, Apr 23, 2004 at 04:15:49PM +0800, Yusuf Goolamabbas wrote:
> Hi, Currently URIDNSBL.pm uses SA's get_uri_list to get a list of URI's
> from a message, the current regex seems to also get uri's of the form
> cid:random_characters in the list
> 
> cid:.* seems to refer to content-ids,attachments in the same message
> when these uris are run through uri_to_domain, they return back the same
> result cid:.*
> 
> My feeling is that a message can contain some artificial cid:.* url's
> which may skew the set of random domains used for SURBL lookup's
> 
> I am not sure if cid:.* url's should be returned from get_uri_list() or
> they should be stripped correctly in uri_to_domain. Quite a few of the
> values after cid: seem to refer to host names/domain names

I did a quick test and cid:.* urls are not checked against SURBL
in SpamCopURI.

I use URI to do all the URI parsing and then check to see if it
has a host method, which only schemes such as http, ftp, gopher, etc.
actually implement.  The cid scheme translates to an internal _foreign
URI type, which has no host implementation.

--eric



> 
> Regards, Yusuf
> _______________________________________________
> Discuss mailing list
> Discuss at lists.surbl.org
> http://lists.surbl.org/mailman/listinfo/discuss


More information about the Discuss mailing list