probable impact of cid:.* urls in uri_to_domain

List overview All Threads
Download

newer

older

RE: [SURBL-Discuss] Re: Bill...

[SURBL-Discuss] [Fwd:...

Yusuf Goolamabbas

23 Apr 2004 23 Apr '04

10:15 a.m.

Hi, Currently URIDNSBL.pm uses SA's get_uri_list to get a list of URI's from a message, the current regex seems to also get uri's of the form cid:random_characters in the list

cid:.* seems to refer to content-ids,attachments in the same message when these uris are run through uri_to_domain, they return back the same result cid:.*

My feeling is that a message can contain some artificial cid:.* url's which may skew the set of random domains used for SURBL lookup's

I am not sure if cid:.* url's should be returned from get_uri_list() or they should be stripped correctly in uri_to_domain. Quite a few of the values after cid: seem to refer to host names/domain names

Regards, Yusuf

Show replies by date

Jeff Chan

23 Apr 23 Apr

10:35 a.m.

On Friday, April 23, 2004, 1:15:49 AM, Yusuf Goolamabbas wrote:

...

Hi, Currently URIDNSBL.pm uses SA's get_uri_list to get a list of URI's from a message, the current regex seems to also get uri's of the form cid:random_characters in the list

...

cid:.* seems to refer to content-ids,attachments in the same message when these uris are run through uri_to_domain, they return back the same result cid:.*

...

My feeling is that a message can contain some artificial cid:.* url's which may skew the set of random domains used for SURBL lookup's

...

I am not sure if cid:.* url's should be returned from get_uri_list() or they should be stripped correctly in uri_to_domain. Quite a few of the values after cid: seem to refer to host names/domain names

I'll leave a detailed response to those more familiar with URIDNSBL internals, but the goal is to remove all but the base domain before comparing it to an SURBL. So I'm hoping any deliberately randomized characters and any other extra stuff is discarded before RBL comparison. Only the basic domain should be checked against the SURBL.

Jeff C.

Yusuf Goolamabbas

11:20 a.m.

...

I'll leave a detailed response to those more familiar with URIDNSBL internals, but the goal is to remove all but the base domain before comparing it to an SURBL. So I'm hoping any deliberately randomized characters and any other extra stuff is discarded before RBL comparison. Only the basic domain should be checked against the SURBL.

Currently, SURBL relies on get_uri_list the grab the list of domains, some uri's may not be appropiate as the basis for which to grab domains for. If that list could be cut down, then the pool from which the random selection is made could be more interesting

e.g, I could write a message with maybe 25-30 cid:.* url's and one real-spamvertised url'. The probability of URIDNSBL.pm to get the spamvertised url will be higher if the noise from the cid:.* url or other non-interesting url's could be removed

PS, Does this list need to have the listname prefixed to the subject line, it wastes a lot of space. I am sure there are other headers one can filter by

Regards, Yusuf

Eric Kolve

3:53 p.m.

On Fri, Apr 23, 2004 at 04:15:49PM +0800, Yusuf Goolamabbas wrote:

...

Hi, Currently URIDNSBL.pm uses SA's get_uri_list to get a list of URI's from a message, the current regex seems to also get uri's of the form cid:random_characters in the list

cid:.* seems to refer to content-ids,attachments in the same message when these uris are run through uri_to_domain, they return back the same result cid:.*

My feeling is that a message can contain some artificial cid:.* url's which may skew the set of random domains used for SURBL lookup's

I am not sure if cid:.* url's should be returned from get_uri_list() or they should be stripped correctly in uri_to_domain. Quite a few of the values after cid: seem to refer to host names/domain names

I did a quick test and cid:.* urls are not checked against SURBL in SpamCopURI.

I use URI to do all the URI parsing and then check to see if it has a host method, which only schemes such as http, ftp, gopher, etc. actually implement. The cid scheme translates to an internal _foreign URI type, which has no host implementation.

--eric

...

Regards, Yusuf _______________________________________________ Discuss mailing list Discuss@lists.surbl.org http://lists.surbl.org/mailman/listinfo/discuss

7740

Age (days ago)

7740

Last active (days ago)

discuss@lists.surbl.org

3 comments

3 participants

tags (0)

participants (3)

Eric Kolve
Jeff Chan
Yusuf Goolamabbas