-----Original Message----- From: Jeff Chan [mailto:jeffc@surbl.org] Sent: Wednesday, May 05, 2004 4:52 PM To: SURBL Discuss Cc: spamassassin-users@incubator.apache.org Subject: Re: Bug in Spamcop's surbl add-on module
On Wednesday, May 5, 2004, 1:03:15 PM, Chris Santerre wrote:
From: Jeff Chan [mailto:jeffc@surbl.org] bigevil.domains:e.1asphost.com
However we should be taking off the subdomain before it goes into the SURBL form. I'll need to check with Chris on how we can handle that.
It's not a bug, its a feature! Actually, no its a bug :) I
have been going
thru cleaning these sort of things up. However I started at
the begining of
the B.E. list. This is at rule 181, I haven't even got to
100 yet! This is
left over from my alpha phase of B.E. I've had a lot more
coffee since then.
Hi Chris, Would you mind if I added a quick regex to remove and third or higher level domains from .com, .biz, .net, .info, etc. from domains before they go into be? It wouldn't be perfect but it could help some.
In other words trim down e.1asphost.com to 1asphost.com (etc) in my own data munging?
Jeff C.
Jeff my friend, nothing would make me happier :) OK, maybe if you sent me some models bearing beer to ask me. That might be a little better. But baring that, sure :-)
--Chris
On Wednesday, May 5, 2004, 2:10:36 PM, Chris Santerre wrote:
From: Jeff Chan [mailto:jeffc@surbl.org]
Hi Chris, Would you mind if I added a quick regex to remove and third or higher level domains from .com, .biz, .net, .info, etc. from domains before they go into be? It wouldn't be perfect but it could help some.
In other words trim down e.1asphost.com to 1asphost.com (etc) in my own data munging?
Jeff my friend, nothing would make me happier :)
OK I've added a "new"-style regex to remove any subdomains on generic TLD domains:
s/^([^.]*.)+([^.]*).(com|net|org|edu|mil|biz|info|int|arpa|name|museum|coop|aero|pro)$/\2.\3/
It seems to do the right thing, both on test cases and the actual data, so it's now live on all the lists. If anyone sees any problems with this regex, please let me know.
Bill's domains from sa-blacklist are already in the correct form :-) and have no subdomains on these gTLD domains going into ws.surbl.org. I added it also to sc.surbl.org which did get rid of a few errant records, so I should probably announce the change. Subdomains are now properly removed in be and sc, as they should have been.
This should result in better matching on both be and sc since the clients are supposed to be doing similar things with message URIs.
Jeff C.
Hello Jeff,
Thursday, May 6, 2004, 3:52:55 AM, you wrote:
JC> OK I've added a "new"-style regex to remove any subdomains on JC> generic TLD domains: JC> http://www.icann.org/tlds/ JC> JC> s/^([^.]*.)+([^.]*).(com|net|org|edu|mil|biz|info|int|arpa|name|museum|coop|aero|pro)$/\2.\3/ ^^^^
The name.tld started life as a 3-level TLD. Many people have individual abc.def.name domains (eg: my own robert.menschel.name).
If you strip that third level, that means that if someone registers spammer.menschel.name (which I have no control over), since I cannot register menschel.name), and spammer.menschel.name then gets added to your lists, my robert.menschel.name will be collateral damage.
I realize that since the .name TLD now accepts both 2-level and 3-level domains (2-level is OK if nobody owns a 3-level domain with that 2nd level), this may be a very complex issue.
Bob Menschel alternate email address: bob@robert.menschel.net alternate web site: http://robert.menschel.name
-----Original Message----- From: discuss-bounces@lists.surbl.org [mailto:discuss- bounces@lists.surbl.org] On Behalf Of Robert Menschel Sent: Friday, 7 May 2004 1:18 PM To: Jeff Chan Cc: SURBL Discuss; spamassassin-users@incubator.apache.org Subject: [SURBL-Discuss] Re[2]: Bug in Spamcop's surbl add-on module
Hello Jeff,
Thursday, May 6, 2004, 3:52:55 AM, you wrote:
JC> OK I've added a "new"-style regex to remove any subdomains on JC> generic TLD domains: JC> http://www.icann.org/tlds/ JC> JC> s/^([^.]*.)+([^.]*).(com|net|org|edu|mil|biz|info|int|arpa|name|museum |coop|aero|pro)$/\2.\3/ ^^^^
The name.tld started life as a 3-level TLD. Many people have individual abc.def.name domains (eg: my own robert.menschel.name).
If you strip that third level, that means that if someone registers spammer.menschel.name (which I have no control over), since I cannot register menschel.name), and spammer.menschel.name then gets added to your lists, my robert.menschel.name will be collateral damage.
We've also found professional email marketing companies which are used by both large whitehat companies & some other companies which are far less reputable. These companies regularly use the client.domain.com format for image & href urls, rather than blocking the whole domain we block/whitelist the subdomain.
Is there a problem in leaving this kind of flexibility in the plugin and also the surbl.org SURBL's?
Cheers!
Dave
======================================================================== Pain free spam & virus protection by: www.mailsecurity.net.au Forward undetected SPAM to: spam@mailsecurity.net.au ========================================================================
On Thursday, May 6, 2004, 9:18:17 PM, David Hooton wrote:
-----Original Message----- From: discuss-bounces@lists.surbl.org [mailto:discuss- bounces@lists.surbl.org] On Behalf Of Robert Menschel
The name.tld started life as a 3-level TLD. Many people have individual abc.def.name domains (eg: my own robert.menschel.name).
If you strip that third level, that means that if someone registers spammer.menschel.name (which I have no control over), since I cannot register menschel.name), and spammer.menschel.name then gets added to your lists, my robert.menschel.name will be collateral damage.
We've also found professional email marketing companies which are used by both large whitehat companies & some other companies which are far less reputable. These companies regularly use the client.domain.com format for image & href urls, rather than blocking the whole domain we block/whitelist the subdomain.
Is there a problem in leaving this kind of flexibility in the plugin and also the surbl.org SURBL's?
In principle the system can be made to handle subdomains or any arbitrary levels of domains, but in practice we have not found it useful or necessary very often. Typically a domain is either spammy or it isn't. Reputable domains don't allow spammer subdomains; for example spammer.yahoo.com or spammer.msn.com don't exist or wouldn't for very long.
Most of the hard core spammers seem to use a disposable second level .com domain for a few days then abandon it in favor of a new one.
The quick and perhaps somewhat wrong solution is to whitelist client.domain.com if domain.com is partially legitimate. In practice we don't see that happening too often, though I'm interested in hearing examples.
Jeff C.
On Thursday, May 6, 2004, 8:17:39 PM, Robert Menschel wrote:
Thursday, May 6, 2004, 3:52:55 AM, you wrote:
JC>> OK I've added a "new"-style regex to remove any subdomains on JC>> generic TLD domains: JC>> http://www.icann.org/tlds/ JC>> JC>> s/^([^.]*.)+([^.]*).(com|net|org|edu|mil|biz|info|int|arpa|name|museum|coop|aero|pro)$/\2.\3/
^^^^
The name.tld started life as a 3-level TLD. Many people have individual abc.def.name domains (eg: my own robert.menschel.name).
If you strip that third level, that means that if someone registers spammer.menschel.name (which I have no control over), since I cannot register menschel.name), and spammer.menschel.name then gets added to your lists, my robert.menschel.name will be collateral damage.
I realize that since the .name TLD now accepts both 2-level and 3-level domains (2-level is OK if nobody owns a 3-level domain with that 2nd level), this may be a very complex issue.
Thanks for the heads up Bob! To be safe, I've removed .name from this regex. This list should have names where only the second level is directly registerable. Now it looks like:
s/^([^.]*.)+([^.]*).(com|net|org|edu|mil|biz|info|int|arpa|museum|coop|aero|pro)$/\2.\3/
Any name not on this list may be processed at the third level, which of course includes many geographic TLDs.
Anyone know if .int or .arpa have any similar properties? OTOH, the more unusual TLDs don't seem to be used in spam very often; we see a lot of com and biz mostly.
<teaching mode for anyone interested>
I probably should have mentioned that this is a Posix new-style regex syntax used with sed, and that it differs from Perl for example in referring to the memorized portions as \2 and \3 instead of $2 and $3 as they would be in a Perl regex.
To explain the regex, [^.] is the class of characters other than dot, so ([^.]*.)+ means at least one of any sequence of zero or more non-dot characters, followed by a dot, followed by ([^.]*). or zero or more non-dot followed by a dot, followed by com, or net, or org, etc. Only the last two character sequences will be output by \2 dot \3, where \1 can be compound. Caret ^ and dollar $ anchor the pattern to the start and end of line, probably unnecessarily. Those *s could probably be +s, where + means 1 or more and * means zero or more.
</teaching mode for anyone interested>
Jeff C.
Jeff Chan wrote:
Anyone know if .int or .arpa have any similar properties? OTOH, the more unusual TLDs don't seem to be used in spam very often; we see a lot of com and biz mostly.
I've been getting some spam from a few .int domains. At least the return address is an .int haven't check the actual urls inside them for .int domains though.
-Doc (D-Ninja)
On Thursday, May 6, 2004, 9:33:36 PM, Doc Schneider wrote:
Jeff Chan wrote:
Anyone know if .int or .arpa have any similar properties? OTOH, the more unusual TLDs don't seem to be used in spam very often; we see a lot of com and biz mostly.
I've been getting some spam from a few .int domains. At least the return address is an .int haven't check the actual urls inside them for .int domains though.
For SURBLs all we care about are the URI domains. As we know, any other forward domains can be trivially forged.
Jeff C.