Fwd: Re: [SURBL-Discuss] checking plain domains in message bodies against SURBLs reportedly effective

Jeff Chan jeffc at surbl.org
Sun Sep 5 02:41:40 CEST 2004

This is a forwarded message
From: Theo Van Dinter <felicity at kluge.net>
To: SURBL Discussion list <discuss at lists.surbl.org>, SpamAssassin Developers <spamassassin-dev at incubator.apache.org>
Date: Saturday, September 4, 2004, 10:36:53 AM
Subject: [SURBL-Discuss]  checking plain domains in message bodies against SURBLs reportedly effective

===8<==============Original message text===============
On Sat, Sep 04, 2004 at 10:45:44AM -0600, Ryan Thompson wrote:
> Yep. Good idea, overall. There are a few gotchas:
> TLD extensions sometimes map file extensions. We might have to whitelist
> command.com, and the entire country of Poland. :-)
> Since the domain is in plain text and doesn't contain a protocol or
> subdomain (i.e., 'www'), I haven't yet seen a mail client that will
> display it as a clickable URL.

This is generally the tact we're taking in SpamAssassin -- if a general
MUA doesn't display it as a link, then we don't consider it an URL.

Another issue for the generic domains thing is performance -- lots of
messages have lots of things like could potentially look like a domain,
and querying for them all adds a bit of a load on the client and the

For instance:  /\b([a-zA-Z0-9_.-]{1,256}\.[a-zA-Z]{2,6})\b/

in theory (I haven't tested it), will grab anything that looks like a
generic domain name in text.  If you check that list against a list of
valid TLDs, you'd probably end up with a decent list, but you'd hit the top
issue quoted above where "Go take a look at command.com" isn't clear if it's
an URL or a filename.

Randomly Generated Tagline:
"Brevity is the soul of lingerie." - Dorothy Parker

===8<===========End of original message text===========

