Jeff Chan wrote on Mon Apr 26 03:58:58 CEST 2004
The underlying principle as I see it is that most major site will have functional anti-abuse and anti-spam policies, so either a base domain is good or bad. I know that seems simplistic, but it's easy and fast to implement AND it seems to match reality pretty well.
There may come a point where a "big" domain starts to appear in spam despite an outward image of "antispam" and the choice which the current infrastructure offers is to block all of it or none of it. Blocking all of it may produce too many FPs whereas blocking a subset may be acceptable. At the moment there haven't been any tough decisions to take on listing or whitelisting. Everything has been very clear cut. It's not guaranteed to stay that way.
I don't know how real a risk this is, but I was worried about an infrastructure which effectively ties our hands on this point. (Once surbl is deployed in many different client software, I suppose it will be hard to change anything about the "public interface" to the data....)
That's an interesting idea. Basically you want to signal redirection to higher domain levels with a special result for levels that should never get checked like co.uk.
That might be doable, but it would require extra logic on the client side as you note. That already sounds more complex than I like, though I see what you're getting at. Better to control what goes into the data (i.e. never let the TLD itself co.uk in), and make sure the client is following similar rules.
What I was proposing is a *change* to client processing logic, but not added *complexity*. rather a simplification.
What I mean is that currently the client has to contain - processing logic - data on the ccTLDs
(in the long run there will be multiple versions of the ccTLD data implemented in various client versions at any one time. Users will have to upgrade the client software to keep current with domain data).
Using an A record in surbl to indicate "this domain is not listed but a subdomain is. Try again" would mean the client just follows a simple processing rule. It doesn't need to know anything about specific domain data.
The logic would be something like this. lets assume that "url" is found in an email.
level=2 more_info_result = '127.0.0.255' listed_result = '127.0.0.2' do { result = query_dns ( extract_domain (url, level)) level = level +1 }while (result == more_info_result)
if (result == listed_result) { score it }
We will always catch bigspammer.co.uk with the current strategy.
I think that some of the ccTLDs have a mixed assignment strategy. This means that they should sometimes be checked at the 2nd level sometime at the 3rd level. The current logic always checks at a single predefined level.
The two example I saw were: .fr and .ca
Currently we check .ca at the third level, but it is possible to register a second level domain at .ca which we never catch so bigspammer.ca will get through.
We check .fr at the second level however there are many "standard" second level domains (like .nom.fr) which means we probably want to be checking these ones at the third level. (Translation: any bigspammer.nom.fr domain is imune to the current strategy unless we want to upset everyone who has a nom.fr domain by listing that).
I know a lot of what I argue for above seems simplistic when a more complex solution could have more interesting results, but very often the simpler solutions are better, especially in terms of resource consumption.
You're right to argue for the most simple solution. (In fact there are probably simpler solutions than the one I suggested!). My concern was to avoid inflexibility in the infrastructure and automatic immunity for various classes of domains.
John
On Tuesday, April 27, 2004, 1:47:41 PM, John Fawcett wrote:
Jeff Chan wrote on Mon Apr 26 03:58:58 CEST 2004
The underlying principle as I see it is that most major site will have functional anti-abuse and anti-spam policies, so either a base domain is good or bad. I know that seems simplistic, but it's easy and fast to implement AND it seems to match reality pretty well.
There may come a point where a "big" domain starts to appear in spam despite an outward image of "antispam" and the choice which the current infrastructure offers is to block all of it or none of it. Blocking all of it may produce too many FPs whereas blocking a subset may be acceptable. At the moment there haven't been any tough decisions to take on listing or whitelisting. Everything has been very clear cut. It's not guaranteed to stay that way.
I don't know how real a risk this is, but I was worried about an infrastructure which effectively ties our hands on this point. (Once surbl is deployed in many different client software, I suppose it will be hard to change anything about the "public interface" to the data....)
That's a valid concern about making the mechanisms too rigid, but the thing to remember is that legitimate parent domain operators like .uk or yahoo.com have a strong incentive to keep their child domains (subdomains) clean of spammer hosting and other abuse.
In other words I don't see the mixed case happening too often, simply due to the best interests of most legitimate sites in *staying* legitimate. But you're right we should not design ourselves into a corner unnecessarily.
That's an interesting idea. Basically you want to signal redirection to higher domain levels with a special result for levels that should never get checked like co.uk.
That might be doable, but it would require extra logic on the client side as you note. That already sounds more complex than I like, though I see what you're getting at. Better to control what goes into the data (i.e. never let the TLD itself co.uk in), and make sure the client is following similar rules.
What I was proposing is a *change* to client processing logic, but not added *complexity*. rather a simplification.
What I mean is that currently the client has to contain
- processing logic
- data on the ccTLDs
(in the long run there will be multiple versions of the ccTLD data implemented in various client versions at any one time. Users will have to upgrade the client software to keep current with domain data).
Using an A record in surbl to indicate "this domain is not listed but a subdomain is. Try again" would mean the client just follows a simple processing rule. It doesn't need to know anything about specific domain data.
The logic would be something like this. lets assume that "url" is found in an email.
level=2 more_info_result = '127.0.0.255' listed_result = '127.0.0.2' do { result = query_dns ( extract_domain (url, level)) level = level +1 }while (result == more_info_result)
if (result == listed_result) { score it }
Yes, this is a nice, modular approach. Though we may want to adjust the specifics, it's a good idea to make the handling of ccTLDs uniform across data and clients somehow.
Another approach would be for the SURBL data side to borrow the same SpamAssassin ccTLD modules that the two SA clients are using. Key is that we're all handling them similarly.
Remember that the goal is to capture the registered domain, whatever form that happens to take.
We will always catch bigspammer.co.uk with the current strategy.
I think that some of the ccTLDs have a mixed assignment strategy. This means that they should sometimes be checked at the 2nd level sometime at the 3rd level. The current logic always checks at a single predefined level.
Not quite; it's table-driven at least on the data side.
If co.uk is in the ccTLD table then the third level is checked, i.e. spammerdomain.co.uk. Since secondlevelspamdomain.uk is *not* in the table it would get checked at the third level... *and caught*. :-)
Eric or Justin, what is the Perl or SA module currently being used on the client side to handle ccTLDs again please? I should probably look into using it on the data side too.
The two example I saw were: .fr and .ca
Currently we check .ca at the third level, but it is possible to register a second level domain at .ca which we never catch so bigspammer.ca will get through.
The signalling is not at the TLD. It's at whatever level is in the table. We don't list .ca, but we do list ab.ca. That means foobar.ab.ca gets checked at the third level and somenewspamdomain.ca gets checked at the second level.
We check .fr at the second level however there are many "standard" second level domains (like .nom.fr) which means we probably want to be checking these ones at the third level. (Translation: any bigspammer.nom.fr domain is imune to the current strategy unless we want to upset everyone who has a nom.fr domain by listing that).
In this case there's a lack of data on the .fr ccTLDs. If somemone could research that and get them to me I'll add them to our table. (Ditto any other countries. :-) FWIW I just added nom.fr to:
nom.fr tm.fr gouv.fr asso.fr nom.fr avocat.fr notaire.fr barreau.fr mairie.fr
The danger in lacking ccTLD data is not that spammers will get away with hosting (spammer.fr will always get caught if the ccTLD data is missing), but that a legitimate ccTLD might get added to the blocklists, i.e. a false positive.
I know a lot of what I argue for above seems simplistic when a more complex solution could have more interesting results, but very often the simpler solutions are better, especially in terms of resource consumption.
You're right to argue for the most simple solution. (In fact there are probably simpler solutions than the one I suggested!). My concern was to avoid inflexibility in the infrastructure and automatic immunity for various classes of domains.
Which is always a good concern. In some ways the simpler the solution the more flexible it will be and vice versa.
Jeff C.
----- Original Message ----- From: "Jeff Chan"
On Tuesday, April 27, 2004, 1:47:41 PM, John Fawcett wrote:
Jeff Chan wrote on Mon Apr 26 03:58:58 CEST 2004 I think that some of the ccTLDs have a mixed assignment strategy. This means that they should sometimes be checked at the 2nd level sometime at the 3rd level. The current logic always checks at a single predefined level.
Not quite; it's table-driven at least on the data side.
If co.uk is in the ccTLD table then the third level is checked, i.e. spammerdomain.co.uk. Since secondlevelspamdomain.uk is *not* in the table it would get checked at the third level... *and caught*. :-)
As far as I could see the table in SpamCopUri contains only the .uk not co.uk. so this means that all .uk domains are being handled in the same way i.e. checked on the third level.
Eric or Justin, what is the Perl or SA module currently being used on the client side to handle ccTLDs again please? I should probably look into using it on the data side too.
The two example I saw were: .fr and .ca
Currently we check .ca at the third level, but it is possible to register a second level domain at .ca which we never catch so bigspammer.ca will get through.
The signalling is not at the TLD. It's at whatever level is in the table. We don't list .ca, but we do list ab.ca. That means foobar.ab.ca gets checked at the third level and somenewspamdomain.ca gets checked at the second level.
Likewise, I saw .ca in the table not ab.ca, so just as for the uk example everything is being checked at the third level by the client, and so spammer.ca. will be missed.
We check .fr at the second level however there are many "standard" second level domains (like .nom.fr) which means we probably want to be checking these ones at the third level. (Translation: any bigspammer.nom.fr domain is imune to the current strategy unless we want to upset everyone who has a nom.fr domain by listing that).
In this case there's a lack of data on the .fr ccTLDs. If somemone could research that and get them to me I'll add them to our table. (Ditto any other countries. :-) FWIW I just added nom.fr to:
nom.fr tm.fr gouv.fr asso.fr nom.fr avocat.fr notaire.fr barreau.fr mairie.fr
I didn't spot any of these being used on the client. So if I am reading things correctly we will never catch spammer.nom.fr etc.
Maybe if Eric is reading this, he can confirm whether this is the case.
John
On Tuesday, April 27, 2004, 10:37:22 PM, John Fawcett wrote:
As far as I could see the table in SpamCopUri contains only the .uk not co.uk. so this means that all .uk domains are being handled in the same way i.e. checked on the third level.
Likewise, I saw .ca in the table not ab.ca, so just as for the uk example everything is being checked at the third level by the client, and so spammer.ca. will be missed.
...
tm.fr gouv.fr asso.fr nom.fr avocat.fr
...
I didn't spot any of these being used on the client. So if I am reading things correctly we will never catch spammer.nom.fr etc.
Maybe if Eric is reading this, he can confirm whether this is the case.
Thanks for the research into how SpamCopURI is handling ccTLDs.
In case it wasn't clear, I was referring to the data side in my description of how the ccTLDs are handled.
For best performance, we probably want to make both the data and client sides behave similarly, whether it's by changing the data side to use the SA module handling ccTLDs, by getting zones with more than two levels out via a special zone or value in SURBLs, or some other way.
But we can say that whitelisting of the known legitimate two-level ccTLDs will guarantee that they won't get into the data and therefore won't match in any SURBL queries. It's a partial solution and does help prevent most FPs that might happen from matching the specific ccTLDs. But it may not be the ultimate solution.
I'll also add a couple points:
1. For SURBLs to be useful preventing FPs is very important, probably more so than catching 100% of spam.
2. So far, :-) there is relatively little abuse of geographic domain names. By far the most abused geographic domain is .us . Spam URI domains in .com, .biz, etc. are several orders of magnitude more numerous than any geographic ones. In that sense catching those is a higher priority, and we are canonically if imperfectly meeting that now.
Jeff C.
On Wed, Apr 28, 2004 at 12:02:26AM -0700, Jeff Chan wrote:
On Tuesday, April 27, 2004, 10:37:22 PM, John Fawcett wrote:
As far as I could see the table in SpamCopUri contains only the .uk not co.uk. so this means that all .uk domains are being handled in the same way i.e. checked on the third level.
Likewise, I saw .ca in the table not ab.ca, so just as for the uk example everything is being checked at the third level by the client, and so spammer.ca. will be missed.
...
tm.fr gouv.fr asso.fr nom.fr avocat.fr
...
I didn't spot any of these being used on the client. So if I am reading things correctly we will never catch spammer.nom.fr etc.
Maybe if Eric is reading this, he can confirm whether this is the case.
Thanks for the research into how SpamCopURI is handling ccTLDs.
In case it wasn't clear, I was referring to the data side in my description of how the ccTLDs are handled.
For best performance, we probably want to make both the data and client sides behave similarly, whether it's by changing the data side to use the SA module handling ccTLDs, by getting zones with more than two levels out via a special zone or value in SURBLs, or some other way.
Agreed. Currently the way things are setup, we can only guarantee that we catch *everything* we intend to catch is if both the client implement identical logic. Ideally, only the server would implement this so clients wouldn't have to adapt to any logic changes, but the exception cases seem so rare (famous last words) so I am not too worried about it.
--eric
But we can say that whitelisting of the known legitimate two-level ccTLDs will guarantee that they won't get into the data and therefore won't match in any SURBL queries. It's a partial solution and does help prevent most FPs that might happen from matching the specific ccTLDs. But it may not be the ultimate solution.
I'll also add a couple points:
- For SURBLs to be useful preventing FPs is very important,
probably more so than catching 100% of spam.
- So far, :-) there is relatively little abuse of geographic
domain names. By far the most abused geographic domain is .us . Spam URI domains in .com, .biz, etc. are several orders of magnitude more numerous than any geographic ones. In that sense catching those is a higher priority, and we are canonically if imperfectly meeting that now.
Jeff C.
Discuss mailing list Discuss@lists.surbl.org http://lists.surbl.org/mailman/listinfo/discuss
Hi, here are some possible additions to your SLD list (copied from my whois server list, I didn't test the servers in 2004):
ac.za whois.ac.za co.za http://www2.frd.ac.za/uninet/zadomains.html au.com whois.au.com e164.arpa whois.ripe.net
Bye, Frank