On Wednesday, April 21, 2004, 3:12:58 PM, Eric Kolve wrote:
On Wed, Apr 21, 2004 at 03:00:52PM -0700, Jeff Chan wrote:
On Wednesday, April 21, 2004, 12:21:16 PM, John Fawcett wrote:
From: "Eric Kolve"
Initially, when I released spamcopuri I decided to pretty much ignore whether the TLD was a country code or not. This results in about twice as many queries as necessary, but guaranteed you would get hits if the domain was listed.
Now that people are pointing this to other RBL's beside just surbl, should we continue to do second and third level queries? Or just the query that we assume to be necessary? My concern is that not all RBLs will process the domains according to a list such as http://www.bestregistrar.com/help/ccTLD.htm. I suppose the worst case scenario is we end up getting a miss when we should be getting a hit because one side presumes that say TLD .za has a subdomain 'foo', when the server doesn't. The server side would expect a second level,
while
the client would do a third level query (this is why I wanted the wildcard records). I guess this really isn't that great a consequence considering the savings and the fact that this shouldn't occur very often.
I will go ahead and make this change if everyone is comfortable with the known risk.
I think if an rhsbl is listing a second level registry domain (like .co.uk) then I think it's up to the list maintainer to implement the wild card so that xxxxx.co.uk returns an A record. I wouldn't worry about taking into account such an extreme case, since I cannot imagine any list wanting to do such widespread blocking.
Yes, the two level ccTDLs like co.uk should never get into a SURBL. Only registrar-type domains should, like foo.co.uk.
I believe there should be a mechanism which distinguishes whether a second or third level lookup is required based on a static lists of domains known to have or not have subdomains. If nothing is known then the default should be to check both second and third as at present.
Aha, now I think I understand what's being proposed.
Currently SpamCopURI checks all domains at the second and third level against a given SURBL, regardless of whether the domain is in a ccTLD or not.
It sounds like Eric is proposing a change, where if a domain is in the ccTLD list like co.uk, then the client should try extract and check a three level domain like foo.co.uk. Otherwise it should check two levels like foo.com.
Is that right? If so it may be ok, though our list of ccTLDs is slightly underspecified (there are some ccTLDs not in it). Note that my ccTLD list:
Yes. This is exactly what I am proposing.
Kewl. Sounds good to me. I'm cc'ing the SpamAssassin devlopers to compare notes on how they're handling ccTLDs in message body URI checks.
http://spamcheck.freeapp.net/two-level-tlds
is (derived from but) slightly more complete than the one at http://www.bestregistrar.com/help/ccTLD.htm ....
Worst case is that we miss a few ccTLDs. Probably not too big a deal given that most of the spam domains are .com, .biz, etc.
I believe Eric is also making a finer point that other SURBL data sources may miss some unexpected geographic domains where foo.za occurred and only two-level base-ccTLDs like foo.com.za were expected. Not sure how to handle unusual cases like that. I suppose we'll need to relay on the country code authorities to be somewhat consistent with respect to what levels they will allow in their ccTLD.
Philosophical point: it's always possible that some spam domains slip through the cracks, but if that happens often enough and we spot them, we can always blacklist them manually. Perfection may not be possible, but we're certainly greatly increasing the spam detection rates with this approach overall.
My only concern is that we leave a wide enough of a hole that we end of playing catch-up and spammers run through various ccTLDs that we have mis-classified using them for links.
Aha, but if a domain is not in the ccTLD list, won't we check it on two levels on both the client and server sides and therefore catch it?
In other words if somenewspamdomain.bg comes up, and it's not in our ccTLD list, our client and server progams will automatically test it as a two level domain and eventually catch it. In that case I think we're ok, and the only danger is blocking new legitimate two level ccTLDs that we're not yet aware of like newlegitimatetld.bg .
Jeff C.