Re: Fwd: Re: Fwd: Re: Announcing SURBL support in SA 2.63 and 3.0
plugins - bug ?
jeffc at surbl.org
Fri Apr 16 05:20:44 CEST 2004
Simon Byrnand, Eric Kolve and I were having a discussion of what
characters are legal in domain names, due to junk showing up
around URIs and apparently confusing some of the SpamAssassin URI
parsing code. Wanted to share some research and ask if anyone
has any other authoritative information on what characters are
currently legal for domain names. This is relevant for anyone
trying to work with domain names.
Also Eric, please share bugs you find in the SA URI parsing
code, preferably by opening a bugzilla, especially if you can
isolate the module, etc.:
Here's a little research on the subject:
The original domain name RFC had names only with letters,
numbers and hyphen:
<domain> ::= <subdomain> | " "
<subdomain> ::= <label> | <subdomain> "." <label>
<label> ::= <letter> [ [ <ldh-str> ] <let-dig> ]
<ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str>
<let-dig-hyp> ::= <let-dig> | "-"
<let-dig> ::= <letter> | <digit>
<letter> ::= any one of the 52 alphabetic characters A through Z in
upper case and a through z in lower case
<digit> ::= any one of the ten digits 0 through 9
Note that while upper and lower case letters are allowed in domain
names, no significance is attached to the case. That is, two names with
the same spelling but different case are to be treated as if identical.
But RFC 2181 leaves things wide open with respect to names:
11. Name syntax
Occasionally it is assumed that the Domain Name System serves only
the purpose of mapping Internet host names to data, and mapping
Internet addresses to host names. This is not correct, the DNS is a
general (if somewhat limited) hierarchical database, and can store
almost any kind of data, for almost any purpose.
The DNS itself places only one restriction on the particular labels
that can be used to identify resource records. That one restriction
relates to the length of the label and the full name. The length of
any one label is limited to between 1 and 63 octets. A full domain
name is limited to 255 octets (including the separators). The zero
length full name is defined as representing the root of the DNS tree,
and is typically written and displayed as ".". Those restrictions
aside, any binary string whatever can be used as the label of any
resource record. Similarly, any binary string can serve as the value
of any record that includes a domain name as some or all of its value
(SOA, NS, MX, PTR, CNAME, and any others that may be added).
Implementations of the DNS protocols must not place any restrictions
on the labels that can be used. In particular, DNS servers must not
refuse to serve a zone because it contains labels that might not be
acceptable to some DNS client programs. A DNS server may be
configurable to issue warnings when loading, or even to refuse to
load, a primary zone containing labels that might be considered
questionable, however this should not happen by default.
Note however, that the various applications that make use of DNS data
can have restrictions imposed on what particular values are
acceptable in their environment. For example, that any binary label
can have an MX record does not imply that any binary name can be used
as the host part of an e-mail address. Clients of the DNS can impose
whatever restrictions are appropriate to their circumstances on the
values they use as keys for DNS lookup requests, and on the values
returned by the DNS. If the client has such restrictions, it is
solely responsible for validating the data from the DNS to ensure
that it conforms before it makes any use of that data.
After scanning the RFC descriptions that were linked from
> RFC 1035 Domain names - implementation and specification.
> Authors: P.V. Mockapetris.
> Date: Nov-01-1987
> Formats: txt pdf
> Obsoletes: RFC 0973, RFC 0882, RFC 0883
> Updated by: RFC 1101, RFC 1183, RFC 1348, RFC 1876, RFC 1982,
> RFC 1995, RFC 1996, RFC 2065, RFC 2136, RFC 2181, RFC 2137, RFC
> 2308, RFC 2535, RFC 2845, RFC 3425, RFC 3658
> Also: STD 0013
it appears that these may be the only two authoritative
statements on what characters can be in domain names:
RFC 1035: letters, numbers, hyphen
RFC 2181: implementations should support anything
Does anyone have any more info on what characters are legal
in domain names?
More information about the Discuss