[SURBL-Discuss] Re: Bug in Spamcop's surbl add-on module

Jeff Chan jeffc at surbl.org
Thu May 6 04:52:55 CEST 2004


On Wednesday, May 5, 2004, 2:10:36 PM, Chris Santerre wrote:

>>From: Jeff Chan [mailto:jeffc at surbl.org]

>>Hi Chris,
>>Would you mind if I added a quick regex to remove and third or
>>higher level domains from .com, .biz, .net, .info, etc. from
>>domains before they go into be?  It wouldn't be perfect but
>>it could help some.
>>
>>In other words trim down  e.1asphost.com  to  1asphost.com (etc)
>>in my own data munging?

> Jeff my friend, nothing would make me happier :)

OK I've added a "new"-style regex to remove any subdomains on
generic TLD domains:

    http://www.icann.org/tlds/

  s/^([^\.]*\.)+([^\.]*)\.(com|net|org|edu|mil|biz|info|int|arpa|name|museum|coop|aero|pro)$/\2.\3/

It seems to do the right thing, both on test cases and the actual
data, so it's now live on all the lists.  If anyone sees any
problems with this regex, please let me know.

Bill's domains from sa-blacklist are already in the correct form :-)
and have no subdomains on these gTLD domains going into
ws.surbl.org.  I added it also to sc.surbl.org which did get rid
of a few errant records, so I should probably announce the
change.  Subdomains are now properly removed in be and sc, as
they should have been. 

This should result in better matching on both be and sc since the
clients are supposed to be doing similar things with message
URIs.

Jeff C.



More information about the Discuss mailing list