We're made a document describing some of the general properties which code using SURBLs should have in order to use the data as it was designed and intended. We hope these comments may be useful to developers. Our Implementation Guidelines are brief and copied below.
http://www.surbl.org/implementation.html
Implementation Guidelines
Here are some very brief guidelines for folks writing software to use SURBL lists. Your code should:
1. Extract URIs from message bodies. (Extraction of URIs from message bodies should ideally include full resolution of redirections into the final target domain name. This can be a non-trivial problem.)
2. Extract base (registrar) domains from those URIs. This includes removing any and all leading host names, subdomains, www., randomized subdomains, etc. In order to determine the base domain it may be necessary to use a table of country code TLDs (ccTLDs) such as the partially-imcomplete one SURBL uses.
3. Not do name resolution on the domains.
4. Look up the domain name in the SURBL by prepending it to the name of the SURBL, e.g., domainundertest.com.sc.surbl.org, then doing Address record DNS resolution on the resulting combined name. A non-result indicates lack of inclusion in the list. A result of 127.0.0.2 represents inclusion, i.e., probable spam.
5. Handle numeric IPs in URIs similarly, but reverse the octet ordering before comparison against the RBL. This is standard practice for RBLs. For example, http://1.2.3.4/ is checked as 4.3.2.1.sc.surbl.org.
SURBL lists unusually have both names and numbers in the same list. For example, 2.0.0.127 and test.surbl.org and similar actual spam domains and addresses are both in all SURBL lists. Numbered addresses in SURBLs should have occurred in spams as numbers, e.g.: literally http://1.2.3.4/. Additional SURBL test points are mentioned in the News & Notes section. __
Please send me any comments, updates, revisions, corrections, questions, etc...
Jeff C.