[SURBL-Discuss] probable impact of cid:.* urls in uri_to_domain

Jeff Chan jeffc at surbl.org
Fri Apr 23 02:35:47 CEST 2004

On Friday, April 23, 2004, 1:15:49 AM, Yusuf Goolamabbas wrote:
> Hi, Currently URIDNSBL.pm uses SA's get_uri_list to get a list of URI's
> from a message, the current regex seems to also get uri's of the form
> cid:random_characters in the list

> cid:.* seems to refer to content-ids,attachments in the same message
> when these uris are run through uri_to_domain, they return back the same
> result cid:.*

> My feeling is that a message can contain some artificial cid:.* url's
> which may skew the set of random domains used for SURBL lookup's

> I am not sure if cid:.* url's should be returned from get_uri_list() or
> they should be stripped correctly in uri_to_domain. Quite a few of the
> values after cid: seem to refer to host names/domain names

I'll leave a detailed response to those more familiar with
URIDNSBL internals, but the goal is to remove all but the
base domain before comparing it to an SURBL.  So I'm hoping
any deliberately randomized characters and any other extra
stuff is discarded before RBL comparison.  Only the basic
domain should be checked against the SURBL.

Jeff C.

More information about the Discuss mailing list