I've updated the SURBL Implementation Guidelines page slightly:
http://www.surbl.org/implementation.html
Implementation Guidelines
Here are some very brief guidelines for folks writing software to use SURBL lists: Your code should:
- Extract URIs from message bodies. (Extraction of URIs
from message bodies should ideally include full resolution of redirections into the final target domain name. This can be a non-trivial problem.) 2. Extract base (registrar) domains from those URIs. This includes removing any and all leading host names, subdomains, www., randomized subdomains, etc. In order to determine the base domain it may be necessary to use a table of country code TLDs (ccTLDs) such as the partially-imcomplete one SURBL uses. 3. Not do name resolution on the domains. 4. Look up the domain name in the SURBL by prepending it to the name of the SURBL, e.g., domainundertest.com.sc.surbl.org then doing Address record DNS resolution. A non-result indicates lack of inclusion in the list. A result of 127.0.0.2 represents inclusion, i.e., probable spam. 5. Handle numeric IPs in URIs similarly, but reverse the octet ordering before comparison against the RBL. This is standard practice for RBLs. For example, http://1.2.3.4/ is checked as 4.3.2.1.sc.surbl.org.
SURBL lists unusually have both names and numbers in the same list. For example, 2.0.0.127 and test.surbl.org and similar actual spam domains and addresses are both in all SURBL lists. Numbered addresses in SURBLs should have occurred in spams as numbers, e.g.: literally http://1.2.3.4/.
Would still like comments about anything I may have left out or anything else before I announce it.
Thanks,
Jeff C.