On Fri, Apr 23, 2004 at 04:15:49PM +0800, Yusuf Goolamabbas wrote:
Hi, Currently URIDNSBL.pm uses SA's get_uri_list to get a list of URI's from a message, the current regex seems to also get uri's of the form cid:random_characters in the list
cid:.* seems to refer to content-ids,attachments in the same message when these uris are run through uri_to_domain, they return back the same result cid:.*
My feeling is that a message can contain some artificial cid:.* url's which may skew the set of random domains used for SURBL lookup's
I am not sure if cid:.* url's should be returned from get_uri_list() or they should be stripped correctly in uri_to_domain. Quite a few of the values after cid: seem to refer to host names/domain names
I did a quick test and cid:.* urls are not checked against SURBL in SpamCopURI.
I use URI to do all the URI parsing and then check to see if it has a host method, which only schemes such as http, ftp, gopher, etc. actually implement. The cid scheme translates to an internal _foreign URI type, which has no host implementation.
--eric
Regards, Yusuf _______________________________________________ Discuss mailing list Discuss@lists.surbl.org http://lists.surbl.org/mailman/listinfo/discuss