On Tuesday, June 8, 2004, 11:09:54 PM, Yusuf Goolamabbas wrote:
I've filed bug 3467 in SA's bugzilla
suggesting that uri_to_domain discount URI's which don't end in valid TLD's. There are test cases in which SA's get_uri_list can pick up URI of the form http://random.gif/ which will return random.gif as the domain and get fed into the pool of candidate domains to check for.
I don't know that SpamCopURI's behaviour is with the testcases I've filed
To be honest, I don't know the exact client behavior either, but philosophically we're original-spam-data-centric. We tend to capture whatever URIs are presented, and on occasion those can be bogus URIs. But those will likely tend to be in the minority since using them probably does the spammer little good.
Most of the code on the data and client sides probably doesn't attempt to determine valid TLDs. The systems are kept relatively open-ended to organically deal with variability that occurs naturally, for example when a new tld is created. It's possible that could cause problems, but my take on it is that things will generally work themselves out. Spammers should not have much incentive to load down their messages with broken URIs. Of course if this causes any major problems we would like to know about it.
Jeff C.