SURBL has a main list of abuse, phishing, malware, and cracked hosts (domains and IPs). Most of the abuse hosts are used for spam. Cracked hosts tend to be used for spam, phishing, malware, botnets, DDOS, etc. SURBL also has full URI data available in different ways. Both types of data may be useful for you, but it may be simplest to start with the host data and then try URIs. There is also a logical process to check our host data first, then check our URI data for deeper information where available. (Not all blacklisted hosts have corresponding blacklisted URIs, and vice versa.)
Ok, got it. I was thinking on parsing URIs only, now I know better. URIs are good when verifying the case is not a false positive.
We can make reports about specific TLDs, for example .br or even Brazilian brands, but the .br domains are also trivially searchable in our main host blacklist.
It's usually simpler parsing than asking for an specific subset. But if an specific subset is all that the source is willing to make available, than we can live with that... we have done it both ways with other data feeds. Having them complete though is showing one interesting feature: if a domain registrant asks for a CNAME or HTTP redirection to a different TLD, having the full dataset instead of per-TLD helps preventing those redirections from ever being provisioned.
Rubens