Jeff Chan wrote:
On Tuesday, July 20, 2004, 6:58:15 AM, David Hooton wrote:
On Tue, 20 Jul 2004 15:27:52 +0200, Marc Kool m.kool@vioro.nl wrote:
I did a quick check on a few domains and I do not share your conclusion.
I think we have a slight case of culture clash here. This adult data is meant to be used in a proxy server where the data is apparently matched literally against URI data from web requests, etc.
SURBLs are designed to be used with specific email message body scanning programs that attempt to reduce the domains found in message body URIs to their registrar (base) domain so that subdomains like "models.home.att.net" are reduced to the base domain "att.net" before being included in a SURBL or checked against a SURBL.
This is new for me and it is clear.
The main reason we did this was to defeat the "random subdomain" spammers who generate random subdomains to try to defeat simple URI pattern matching or to key their spams to confirm the recipient addresses. Examples might be "abc1.xyz.spammerdomain.com" and "abc2.xyz.spammerdomain.com". Those we want to reduce to just "spammerdomain.com" since the randomized/keyed versions may occur only once and the sc.surbl.org data engine tries to increase the likelyhood of inclusion in the list with an increasing number of reports.
It may be useful to read about the sc.surbl.org data:
Yep, the reasons why this is done are clear but are not flawless. There are ISPs myisp.net that give customers a subdomain: e.g. myspamsite.myisp.net which can not be included in SURBL. I also assume that the percentage of these type of domains is not so big...
*snip*
Given my very quick testing I think it would probably be worth giving this data a try, we would most likely need to work out how to remove the subdomained entries - the list is huge, and efficiency we can gain by removing excess data would obviously be useful.
Good suggestion, but perhaps slightly tricky to implement, depending on the data.
I can easily use a regex to delete entries with subdomains like "xxxmovies.home.att.net" so that "att.net" does not get on the list. But that would only be effective if the deliberately randomized domains like "abc.xyz.spammerdomain.com" were reduced to "spammerdomain.com" in the source data, otherwise we would lose both.
In other words, if the data is a literal transcription of everything found in spams, including randomized URIs like "abc.xyz.spammerdomain.com," then we will lose the latter if I discard all subdomains.
So Mark, can you tell us if the randomized domains that spammers frequently used are reduced to the base domains in the adult data, i.e. "spammerdomain.com" and not "abc.xyz.spammerdomain.com"?
Nope :-(
Jeff C.
There are indeed "different cultures" surbl: fight spam of which lots is adult related squidguard: block adult sites of which only a small percentage spams
_I assume that most sites (want to) fight spam also (want to) block adult sites_.
For the record: my originals proposal would make sex.surbl.org more of a squidguard-based list than a surbl-based list.
One of the reasons to propose sex.surbl.org was the fact that SURBL list lag behind reality. In July I received 156 spams of which 16 were not detected by SA+SOME_SARE_RULES+OWN_RULES+SURBL because the SURBL lists were not updates fast enough (the 16 spams were marked as spam at a later time because then SURBL marked them and the SA rating went up). This is not meant to criticize anybody, just to put a fact.
I observed that many spams from new domains - share IP addresses - automatically forward you to a known sex site (in the squidguard database) and proposed sex.surbl.org
I hate to say it :-) but if the implementation gives to much headaches, the proposal as it is now, can be disregarded.
However, I see some value for the squidguard adult database to be used by software behind spamtraps: if an URI is retrieved and redirects you to a known sex site, the URI can be added automatically (= fast) to a SURBL list.
Marc