I took the top 50th percentile of the multiple-version-intersected DMOZ domains and matched them with the top 70th percentile of the multiple-version-intersected wikipedia domains:
http://spamcheck.freeapp.net/whitelists/dmoz-50thpercentile.srt
http://spamcheck.freeapp.net/whitelists/wikipedia-70thpercentile.srt
resulting in a list of about 15k domains:
http://spamcheck.freeapp.net/whitelists/percentile-wikipedia-dmoz.srt
(first column is word counts:)
255838 255838 3955772 dmoz-50thpercentile.srt 28323 28323 402512 wikipedia-70thpercentile.srt 14982 14982 202925 percentile-wikipedia-dmoz.srt
that appeared at least two or three times in both data sources. (Percentiles were chosen to give about two or three hits of domains within the same source.) I then matched those 15k against the list of all SURBL domains and got the following hits, all possible FPs:
http://spamcheck.freeapp.net/whitelists/percentile-wikipedia-dmoz-blocklist....
0catch.com 1asphost.com 741.com 8bit.co.uk 8m.net anzwers.org arena.ne.jp away.com centralhome.com cheapass.com f2g.net faithweb.com fateback.com fortunecity.de freewebpage.org galeon.com htmlplanet.com i8.com itgo.com iwarp.com kit.net kki.net.pl ledger-enquirer.com nana.co.il online-dictionary.biz p5.org.uk quuxuum.org republika.pl s5.com spaceports.com t35.com telepolis.com transnationale.org up.co.il xiloo.com zip.net zonai.com
One additional test I'd like to apply to all these data is to remove any that are listed in SBL, but I haven't coded that up yet. However Ryan Thompson's GetURI does include an SBL check, along with other goodies like domain age, so I fed these into his CGI version, with the results at:
http://ry.ca/cgi-bin/geturi.cgi?id=ham-es0EnYAUmBru8HxYCzvQ5x
It looks like all are between three and ten years old, aside from:
fortunecity.de 2.7 years old 16 NANAS online-dictionary.biz 214 days 0 NANAS nana.co.il 552 days 754 NANAS 1asphost.com 786 days 60 NANAS
And only these two had SBL hits:
8bit.co.uk 1796 days 131 NANAS xiloo.com 1667 days 342 NANAS
Aside from those two, the rest may be candidates for whitelisting, though I did not check them further. (Note also that GetURI does not count NANAS; I did those few manually.
May I ask for some help in checking these?
Note that we should still continue to check the DMOZ hits since there are probably some more FPs in there also:
http://spamcheck.freeapp.net/whitelists/dmoz-blocklist.summed.txt http://spamcheck.freeapp.net/whitelists/dmoz-blocklist.txt http://spamcheck.freeapp.net/whitelists/dmoz-blocklist.ws
1338 13380 141946 dmoz-blocklist1.summed.txt 1338 1338 20533 dmoz-blocklist1.txt 1173 11730 124298 dmoz-blocklist1.ws
Most are in WS.
Jeff C. -- "If it appears in hams, then don't list it."
Jeff,
As usual, this stuff is moving so fast, I can hardly keep up.
The following was the **original** list of Whitelist candidates:
http://spamcheck.freeapp.net/whitelists/dmoz-blocklist.txt
Can you give us an update as to which of these have now been whitelisted, which are still candidates for whitelisting, and which have been decided against whitelisting. I just want to make sure that the original list wasn't whitelisted across the board... and if it was, I'd like to know so that I can make better decisions about which of these to keep these in my private blocklist.
Also, while testing against this list, I did find a couple of FPs (as I previously reported). However, I also found much spam. Would it still be helpful for me to post to this discussion a list of those URIs which DID catch spam (from the original list)? Instead, I've been making it my priority to find and report FPs... but don't let my lack of reporting spam "hits" fool you. This original list did catch many spams.
Or, perhaps I should move on and just start fresh applying this same kind of testing to the new list below:
http://spamcheck.freeapp.net/whitelists/percentile-wikipedia-dmoz-blocklist. txt
I do notice some overlap between the two lists.
Sorry this post was "all over the map"... just answer as best you can and we'll go from there.
Thanks,
Rob McEwen
-----Original Message----- From: discuss-bounces@lists.surbl.org [mailto:discuss-bounces@lists.surbl.org] On Behalf Of Jeff Chan Sent: Saturday, October 09, 2004 6:52 AM To: SURBL Discuss Subject: [SURBL-Discuss] Took top percentiles of DMOZ and wikipedia domains,some results
I took the top 50th percentile of the multiple-version-intersected DMOZ domains and matched them with the top 70th percentile of the multiple-version-intersected wikipedia domains:
http://spamcheck.freeapp.net/whitelists/dmoz-50thpercentile.srt
http://spamcheck.freeapp.net/whitelists/wikipedia-70thpercentile.srt
resulting in a list of about 15k domains:
http://spamcheck.freeapp.net/whitelists/percentile-wikipedia-dmoz.srt
(first column is word counts:)
255838 255838 3955772 dmoz-50thpercentile.srt 28323 28323 402512 wikipedia-70thpercentile.srt 14982 14982 202925 percentile-wikipedia-dmoz.srt
that appeared at least two or three times in both data sources. (Percentiles were chosen to give about two or three hits of domains within the same source.) I then matched those 15k against the list of all SURBL domains and got the following hits, all possible FPs:
http://spamcheck.freeapp.net/whitelists/percentile-wikipedia-dmoz-blocklist. txt
0catch.com 1asphost.com 741.com 8bit.co.uk 8m.net anzwers.org arena.ne.jp away.com centralhome.com cheapass.com f2g.net faithweb.com fateback.com fortunecity.de freewebpage.org galeon.com htmlplanet.com i8.com itgo.com iwarp.com kit.net kki.net.pl ledger-enquirer.com nana.co.il online-dictionary.biz p5.org.uk quuxuum.org republika.pl s5.com spaceports.com t35.com telepolis.com transnationale.org up.co.il xiloo.com zip.net zonai.com
One additional test I'd like to apply to all these data is to remove any that are listed in SBL, but I haven't coded that up yet. However Ryan Thompson's GetURI does include an SBL check, along with other goodies like domain age, so I fed these into his CGI version, with the results at:
http://ry.ca/cgi-bin/geturi.cgi?id=ham-es0EnYAUmBru8HxYCzvQ5x
It looks like all are between three and ten years old, aside from:
fortunecity.de 2.7 years old 16 NANAS online-dictionary.biz 214 days 0 NANAS nana.co.il 552 days 754 NANAS 1asphost.com 786 days 60 NANAS
And only these two had SBL hits:
8bit.co.uk 1796 days 131 NANAS xiloo.com 1667 days 342 NANAS
Aside from those two, the rest may be candidates for whitelisting, though I did not check them further. (Note also that GetURI does not count NANAS; I did those few manually.
May I ask for some help in checking these?
Note that we should still continue to check the DMOZ hits since there are probably some more FPs in there also:
http://spamcheck.freeapp.net/whitelists/dmoz-blocklist.summed.txt http://spamcheck.freeapp.net/whitelists/dmoz-blocklist.txt http://spamcheck.freeapp.net/whitelists/dmoz-blocklist.ws
1338 13380 141946 dmoz-blocklist1.summed.txt 1338 1338 20533 dmoz-blocklist1.txt 1173 11730 124298 dmoz-blocklist1.ws
Most are in WS.
Jeff C. -- "If it appears in hams, then don't list it."
_______________________________________________ Discuss mailing list Discuss@lists.surbl.org http://lists.surbl.org/mailman/listinfo/discuss
On Saturday, October 9, 2004, 6:27:00 AM, Rob McEwen wrote:
As usual, this stuff is moving so fast, I can hardly keep up.
The following was the **original** list of Whitelist candidates:
Actually, these are the revised matches from October 8 with the domains intersected (joined) across three different snapshots of DMOZ data from a time period spanning about a month. Since that intersection was joined against SURBLs a couple days later than the original one, this later version has a few whitelisted records already removed.
Probably confusingly, I renamed the original one (from October 6), based on a single snapshot of dmoz, to:
http://spamcheck.freeapp.net/whitelists/dmoz-blocklist1.txt
(It's reasonable to assume that would be a later version, but it's an older one. I usually give the latest file the original name, and add a revision number to the name of the old version with the number incremented as each "current" one gets archived, i.e. the previous current one would become dmoz-blocklist2.txt when a new current one replaces it.)
I hope there isn't too much version confusion, but most of the matches remain unchanged regardless.
Can you give us an update as to which of these have now been whitelisted, which are still candidates for whitelisting, and which have been decided against whitelisting. I just want to make sure that the original list wasn't whitelisted across the board... and if it was, I'd like to know so that I can make better decisions about which of these to keep these in my private blocklist.
No bulk lists have been whitelisted, only individual FPs specifically checked and reported based on the matches. Certainly those whitelisted FPs are only a small faction of the DMOZ matches so far. Therefore most of the matches in the current version still need to be checked.
Also, while testing against this list, I did find a couple of FPs (as I previously reported). However, I also found much spam. Would it still be helpful for me to post to this discussion a list of those URIs which DID catch spam (from the original list)?
Yes. It's useful to know how spammy the DMOZ domains are.
Instead, I've been making it my priority to find and report FPs... but don't let my lack of reporting spam "hits" fool you. This original list did catch many spams.
Both spams and FPs are of interest. FPs are probably more urgent to detect and get out of the data however.
Or, perhaps I should move on and just start fresh applying this same kind of testing to the new list below:
http://spamcheck.freeapp.net/whitelists/percentile-wikipedia-dmoz-blocklist. txt
This is a much smaller list with only 37 matches between the percentiled (much smaller) wikipedia and dmoz lists and SURBLs. Given that it has tighter inclusion criteria, I think it would be good to focus on finding FPs to whitelist in these first, then go back to the larger list of DMOZ matches.
Hope this helps, and thanks much for your help. Multiple opinions on these 37 would be welcomed. Comparing notes could be useful.
Jeff C. -- "If it appears in hams, then don't list it."
OK for completeness, or to thoroughly compound the confusion, ;-) here are joins of the (much smaller) percentiled dmoz and wikipedia lists:
http://spamcheck.freeapp.net/whitelists/dmoz-50thpercentile.srt
255838 255838 3955772 dmoz-50thpercentile.srt
http://spamcheck.freeapp.net/whitelists/wikipedia-70thpercentile.srt
28323 28323 402512 wikipedia-70thpercentile.srt
against the SURBL whitelist and blocklist domains (and WS):
http://spamcheck.freeapp.net/whitelists/dmoz-50thpercentile-whitelist.txt http://spamcheck.freeapp.net/whitelists/dmoz-50thpercentile-blocklist.txt http://spamcheck.freeapp.net/whitelists/dmoz-50thpercentile-blocklist.summed... http://spamcheck.freeapp.net/whitelists/dmoz-50thpercentile-blocklist.ws
2962 2962 36518 dmoz-50thpercentile-whitelist.txt 236 236 3312 dmoz-50thpercentile-blocklist.txt 236 2360 24355 dmoz-50thpercentile-blocklist.summed.txt 233 2330 24044 dmoz-50thpercentile-blocklist.ws
http://spamcheck.freeapp.net/whitelists/wikipedia-70thpercentile-whitelist.t... http://spamcheck.freeapp.net/whitelists/wikipedia-70thpercentile-blocklist.t... http://spamcheck.freeapp.net/whitelists/wikipedia-70thpercentile-blocklist.s... http://spamcheck.freeapp.net/whitelists/wikipedia-70thpercentile-blocklist.w...
1260 1260 14702 wikipedia-70thpercentile-whitelist.txt 47 47 574 wikipedia-70thpercentile-blocklist.txt 47 470 4685 wikipedia-70thpercentile-blocklist.summed.txt 45 450 4471 wikipedia-70thpercentile-blocklist.ws
One reason I didn't mention these before is because they're kind of mid-way between the larger lists and the smaller one combining them all (with 37 records), so I didn't want to focus on them.
For comparison purposes, the percentiled lists are much smaller than the non-percentiled ones, because there are many domains in each corpus with only one entry. Here are the original (un-percentiled) sizes compared with the percentiled ones:
http://spamcheck.freeapp.net/whitelists/dmoz.srt http://spamcheck.freeapp.net/whitelists/dmoz-50thpercentile.srt
2300851 2300851 38065969 dmoz.srt 255838 255838 3955772 dmoz-50thpercentile.srt
http://spamcheck.freeapp.net/whitelists/wikipedia.srt http://spamcheck.freeapp.net/whitelists/hpercentile.srt
173828 173828 2633441 wikipedia.srt 28323 28323 402512 wikipedia-70thpercentile.srt
So you can see why the matches of the percentiled data against SURBLs are fewer.
Jeff C. -- "If it appears in hams, then don't list it."
Jeff Chan wrote to SURBL Discuss:
xiloo.com 1667 days 342 NANAS
Aside from those two, the rest may be candidates for whitelisting, though I did not check them further. (Note also that GetURI does not count NANAS; I did those few manually.
Hi Jeff,
The reason for that is twofold:
1. Obtaining that information automatically, although relatively easy and good for a fun time, expressly violates Google's ToS. They have a client library for automated queries, but it only allows 1000 queries per account per day, and doesn't yet work with Google Groups.
2. As we know, raw NANAS counts can be extremely misleading. For instance, as you pointed out a few days ago, yahoo.com has > 0.5M NANAS hits.
By forcing someone to click on the "[ NANAS ]" link, GetURI plays nicely with Google, and encourages people to hand-check NANAS hits to look for spamvertised examples. I'd worry that with raw counts automatically displayed, that some would draw false conclusions from "xx NANAS".
May I ask for some help in checking these?
Sure, I'll peek at a few right away.
- Ryan
On Saturday, October 9, 2004, 11:14:59 AM, Ryan Thompson wrote:
- Obtaining that information automatically, although relatively easy and good for a fun time, expressly violates Google's ToS. They have a client library for automated queries, but it only allows 1000 queries per account per day, and doesn't yet work with Google Groups.
- As we know, raw NANAS counts can be extremely misleading. For instance, as you pointed out a few days ago, yahoo.com has > 0.5M NANAS hits.
By forcing someone to click on the "[ NANAS ]" link, GetURI plays nicely with Google, and encourages people to hand-check NANAS hits to look for spamvertised examples. I'd worry that with raw counts automatically displayed, that some would draw false conclusions from "xx NANAS".
Fair enough. :-)
One feature to consider adding to GetURI: if it doesn't already, could it check www.domain.com against SBL?
Jeff C. -- "If it appears in hams, then don't list it."
On Saturday, October 9, 2004, 11:14:59 AM, Ryan Thompson wrote:
- Obtaining that information automatically, although relatively easy and good for a fun time, expressly violates Google's ToS. They have a client library for automated queries, but it only allows 1000 queries per account per day, and doesn't yet work with Google Groups.
BTW, it's nice to use Google for consistency, but NANAS is a Usenet newsgroup and should therefore be available in web form from many sources beyond just google.
Jeff C. -- "If it appears in hams, then don't list it."
Jeff Chan wrote to SURBL Discuss:
0catch.com
Free web space
1asphost.com
741.com
Free/paid web host.
8bit.co.uk
Blank index on main (www) site. Subdomain only?
anzwers.org
Unlimited free web space.
f2g.net
Free web hosting.
faithweb.com fateback.com fortunecity.de freewebpage.org
Another free host.
galeon.com
Can't translate, but looks like another hoster.
8m.net iwarp.com htmlplanet.com itgo.com
More free hosts. These all redirect to freeservers.com (not listed)
i8.com
50megs.com free host
Ok, is it just coincidence that all of these so far are free hosts?
- Ryan
On Saturday, October 9, 2004, 11:29:42 AM, Ryan Thompson wrote:
Jeff Chan wrote to SURBL Discuss:
0catch.com
Free web space
1asphost.com
741.com
Free/paid web host.
8bit.co.uk
Blank index on main (www) site. Subdomain only?
anzwers.org
Unlimited free web space.
f2g.net
Free web hosting.
faithweb.com fateback.com fortunecity.de freewebpage.org
Another free host.
galeon.com
Can't translate, but looks like another hoster.
8m.net iwarp.com htmlplanet.com itgo.com
More free hosts. These all redirect to freeservers.com (not listed)
i8.com
50megs.com free host
Ok, is it just coincidence that all of these so far are free hosts?
- Ryan
Yes those all look like free hosting providers. Should we whitelist them? I think so. I also think they're probably not used by major spammers.
It's perhaps worth noting that most of these free sites seem to have some legitimate uses and the free hosting providers (themselves) probably should not be listed. I see music, movie, literature, etc. sites hosted on them in the DMOZ and wikipedia data. Mostly they look like personal fan and hobby pages, which seems reasonable for free hosts.
Jeff C. -- "If it appears in hams, then don't list it."
On Saturday, October 9, 2004, 6:32:11 PM, Jeff Chan wrote:
On Saturday, October 9, 2004, 11:29:42 AM, Ryan Thompson wrote:
Jeff Chan wrote to SURBL Discuss:
0catch.com
Free web space
1asphost.com
741.com
Free/paid web host.
8bit.co.uk
Blank index on main (www) site. Subdomain only?
anzwers.org
Unlimited free web space.
f2g.net
Free web hosting.
faithweb.com fateback.com fortunecity.de freewebpage.org
Another free host.
galeon.com
Can't translate, but looks like another hoster.
8m.net iwarp.com htmlplanet.com itgo.com
More free hosts. These all redirect to freeservers.com (not listed)
i8.com
50megs.com free host
Ok, is it just coincidence that all of these so far are free hosts?
- Ryan
Yes those all look like free hosting providers. Should we whitelist them?
Since there was no other comment, I went ahead and whitelisted these free hosting sites that Ryan helped identify. I also added made.com, which Andy Warner's Korean friend helped identify as a free hosting site. The resulting list is:
http://spamcheck.freeapp.net/whitelists/freehosts.sort
0catch.com 1asphost.com 741.com 8bit.co.uk anzwers.org f2g.net faithweb.com fateback.com fortunecity.de freewebpage.org galeon.com 8m.net iwarp.com htmlplanet.com itgo.com i8.com made.com
(FWIW I checked these again myself to confirm they're all hosting sites, and they are. A couple, like htmlplanet, belong to "freeservers.com")
We still need help identifying the other potential FPs from the percentiled top DMOZ and wikipedia list:
http://spamcheck.freeapp.net/whitelists/percentile-wikipedia-dmoz-blocklist....
arena.ne.jp away.com centralhome.com cheapass.com kit.net kki.net.pl ledger-enquirer.com nana.co.il online-dictionary.biz p5.org.uk quuxuum.org republika.pl s5.com spaceports.com t35.com telepolis.com transnationale.org up.co.il xiloo.com zip.net zonai.com
See the first message in this thread for a little more information about some of these:
http://lists.surbl.org/pipermail/discuss/2004-October/003169.html
Jeff C. -- "If it appears in hams, then don't list it."
On Tuesday, October 12, 2004, 1:16:13 AM, Chris wrote:
s5.com is freeservers too.
Indeed, as are all of:
4mg.com 4t.com 50megs.com 5u.com 8k.com 8m.com faithweb.com freehosting.net freeservers.com htmlplanet.com i8.com itgo.com iwarp.com s5.com tvheaven.com
I'm whitelisting them all. The only to be blacklisted previously were:
4mg.com s5.com
Jeff C. -- "If it appears in hams, then don't list it."
On Monday, October 11, 2004, 4:59:24 PM, Jeff Chan wrote:
We still need help identifying the other potential FPs from the percentiled top DMOZ and wikipedia list:
http://spamcheck.freeapp.net/whitelists/percentile-wikipedia-dmoz-blocklist....
arena.ne.jp away.com centralhome.com cheapass.com kit.net kki.net.pl ledger-enquirer.com nana.co.il online-dictionary.biz p5.org.uk quuxuum.org republika.pl s5.com spaceports.com t35.com telepolis.com transnationale.org up.co.il xiloo.com zip.net zonai.com
See the first message in this thread for a little more information about some of these:
http://lists.surbl.org/pipermail/discuss/2004-October/003169.html
Would anyone like to help check these potential FP domains?
Jeff C. -- "If it appears in hams, then don't list it."
Would anyone like to help check these potential FP domains?
Jeff,
The list in the URL is larger than the list in your post. I presume because the list in your post excludes those which have since been whitelisted?
Anyway, I ran a "rule" on my spam filter which would archive any e-mail going to my server which had any of these domains in the body of the message. About 7,000 messages go though my server daily (~40% legit/~60% spam). Over the course of at least a few days, not a single message (ham or spam) contained any of the domains in that list.
Also, to be sure, I tested it by sending myself an e-mail with one of these domains in it and, sure enough, the message DID get copied to the correct folder for review... but, as I said, no "real" messages going through my mail server contained any of these domains over the course of 4 or 5 days.
I finally turned that particular "rule" off this morning.
Absent compelling evidence otherwise, I'd suggest at least removing these from SURBL... maybe even whitelisting if there are no other objections and if they don't have SpamHaus records.
Rob McEwen
On Wednesday, October 13, 2004, 9:30:52 PM, Rob McEwen wrote:
Would anyone like to help check these potential FP domains?
The list in the URL is larger than the list in your post. I presume because the list in your post excludes those which have since been whitelisted?
Anyway, I ran a "rule" on my spam filter which would archive any e-mail going to my server which had any of these domains in the body of the message. About 7,000 messages go though my server daily (~40% legit/~60% spam). Over the course of at least a few days, not a single message (ham or spam) contained any of the domains in that list.
Also, to be sure, I tested it by sending myself an e-mail with one of these domains in it and, sure enough, the message DID get copied to the correct folder for review... but, as I said, no "real" messages going through my mail server contained any of these domains over the course of 4 or 5 days.
I finally turned that particular "rule" off this morning.
Absent compelling evidence otherwise, I'd suggest at least removing these from SURBL... maybe even whitelisting if there are no other objections and if they don't have SpamHaus records.
Rob McEwen
Thanks Rob, Yes, the list in the URL was from all of the SURBL hits against the top percentile of mentions in Wikipedia and DMOZ. About half of them were already whitelisted as free hosting sites thanks to Ryan checking on them.
Thanks for the feedback about the entire list not appearing in several of messages days to your server.
Since I may not have access to the actual data sources on these, I may not be able to remove them (or see spams for them) and would therefore need to whitelist them to get them out of SURBLs.
Also presumably someone had a reason for listing these in the first place, so we should probably research them to confirm spammyness or not before whitelisting. I'll do it if no one else would like to help, but would appreciate some help from others....
They (or I :-) should probably make use of the policy we're about to publish for manual listing:
http://www.surbl.org/policy.html
Jeff C. -- "If it appears in hams, then don't list it."
On Wednesday, October 13, 2004, 9:00:29 PM, Jeff Chan wrote:
On Monday, October 11, 2004, 4:59:24 PM, Jeff Chan wrote:
See the first message in this thread for a little more information about some of these:
http://lists.surbl.org/pipermail/discuss/2004-October/003169.html
Thanks to Ryan and others for identifying most of the free hosting sites in the full list. We whitelisted those free sites earlier.
We still need help identifying the other potential FPs from the percentiled top DMOZ and wikipedia list:
http://spamcheck.freeapp.net/whitelists/percentile-wikipedia-dmoz-blocklist....
] I went ahead and checked the remainder:
arena.ne.jp
1997 domain, web hosting site belonging to Japan's national telco NTT, no SBL, 72 NANAS - mostly sender addresses and a few abuse reporting addresses. 36 DMOZ hits. Probably should not be listed.
away.com
1995 domain, no SBL, 7 NANAS - all apparently joe job or "whitening" or "chaff" type false inclusions in spam. away.com appears to be a legitimate travel site with no actual abuse. Probably should not be listed.
centralhome.com
1998 domain, no SBL, no NANAS, May be a legitimate site: "Dance, Exercise, Sports and Fitness Videos, DVD, Books & Accessories". Probably should not be listed.
cheapass.com
1997 domain, no SBL, no NANAS, sells board games. Looks legitimate. Probably should not be listed.
kit.net
1997 domain, no SBL, 2000+ NANAS - user abuse of this hosting site. Redirects to kitnet.globo.com. Clearly this domain is abused. The only question is whether it has enough legitimate use to whitelist.
Hosting IP belongs to Embratel, and is listed in only SPEWS as a /24, which I don't consider particularly meaningful. IP is not listed in any other RBLs that openrbl.org knows about. Possibly ok to whitelist, though as others have noted, it does get abused a lot. 4 DMOZ hits.
kki.net.pl
1997 domain, no SBL, 248 NANAS - almost all in forged headers, appears to be a legitimate Polish ISP with mail and web hosting. Personal web sites probably subject to some abuse, but the NANAS hits were almost all forged mentions in mail headers. Appears to have legitimate uses. 15 DMOZ hits.
ledger-enquirer.com
1997 domain, no SBL, 1 NANAS in spam headers as forged recipient, which is usually meaningless. This is the web site of a local Georgia newspaper owned by large newspaper chain Knight Ridder. Almost certainly not a spam gang.
nana.co.il
2000 or earlier domain, no SBL, 754 NANAS - mostly forged headers, major Internet portal in Israel, appears to have legitimate uses. 40 DMOZ hits.
online-dictionary.biz
March 2004 domain registration, no SBL, no NANAS, mentioned as an online reference in Wikipedia as "free multi-lingual online dictionary between English and seven modern languages". Probably has legitimate uses despite the bizarre choice of TLD.
p5.org.uk
2001 domain belonging to portland.co.uk like 8bit.co.uk and some other free hosts we recently whitelisted, no SBL, 17 NANAS, appears to have some legitimate uses and some minor abuse. 2 DMOZ.
quuxuum.org
1996 domain, no SBL, 23 NANAS, all referring to "evan's" Bill Gates' net worth page, probably a Joe Job or chaff in contest scams. (Or some kind of sick, envy-driven justification in the scammers' puny brains for trying to scam people.) This site looks like a personal web server with some legitimate personal hobbyist uses. Does not seem to be a major spam destination or spammer, at least based on visible sites and NANAS mentions. 6 DMOZ.
republika.pl
1999 domain, no SBL, 12 NANAS, but 2526 DMOZ mentions, so it probably has far more legitimate uses than spam uses. Appears to be a Polish hosting provider. Probably should not be listed.
s5.com
1996 domain, no SBL, 116 NANAS, free hosting site belongs to freeservers.com. Already whitelisted along with others belonging to them.
spaceports.com
1997 domain, no SBL, 22 NANAS from 1999 through 2003. None in 2004. Hosting provider with reasonable looking abuse policies. The reports seem short-lived and few, indicating that they may be stopping abusers. 229 DMOZ hits. Probably should be whitelisted.
t35.com
1999 domain, no SBL, 59 NANAS, 44 DMO. Hosting provider. Seems to have reasonable abuse policies including specifically prohibiting spam mentions.
telepolis.com
1996 domain, no SBL, 33 NANAS, 262 DMOZ. Wanadoo Spain ISP. Minor spam from personal web or picture hosting. Spam-mentioned sites seem shut down, so probably has a functional abuse desk. Has some minor abuse, but probably should not be listed.
transnationale.org
1999 domain, no SBL, 250 NANAS, 3 DMOZ, French web site apparently tracking social policies of international companies. The NANAS hits are almost entirely mentions in 419-type spams, probably due to socio-political articles, but that does not make the domain spammers. Oddly it's the same article URI mentioned in every spam I checked, and that URI no longer serves a page. Perhaps the site owners got tired of the abuse and took it down. Again, this does not make the site spammers, more like victims of the mention. Probably should not be listed.
up.co.il
1998 or earlier domain, no SBL, 23 NANAS - mix of senders and hosts but relatively few, 48 DMOZ, Israel web hosting company. Some minor abuse, but probably should not be listed. Spam- mentioned sites seem shut down.
xiloo.com
2000 domain, in SBL, 345 NANAS, 8 DMOZ. Appears to be a China ISP or portal. Can't determine much more than that. Also owns xilu.com. Source on WS is:
/home/dbfunk/black-dbfunk-2:xiloo.com
Dave, got any data on them?
zip.net
1998 domain, not in SBL, 230 NANAS looking like abusive users.
Already whitelisted per Joe Wein's report:
"zip.net (http://zipmail.uol.com.br/) is a webmailer by UOL in Brazil."
zonai.com
1999 domain, no SBL, 88 NANAS, 2 DMOZ. Looks like Puerto Rico web portal. Appears to have legitimate uses. NANAS hits all appear to be 419-scam reply mentions and mail headers. Probably should not be listed.
So out of the above all should probably be whitelisted, except xiloo.com and kit.net for which I can't determine enough. Can any Chinese readers check out xiloo.com?
Does anyone have any comments on any of these?
Jeff C. -- "If it appears in hams, then don't list it."
On Saturday, October 16, 2004, 3:57:36 AM, Jeff Chan wrote:
On Wednesday, October 13, 2004, 9:00:29 PM, Jeff Chan wrote:
On Monday, October 11, 2004, 4:59:24 PM, Jeff Chan wrote:
See the first message in this thread for a little more information about some of these:
http://lists.surbl.org/pipermail/discuss/2004-October/003169.html
Thanks to Ryan and others for identifying most of the free hosting sites in the full list. We whitelisted those free sites earlier.
We still need help identifying the other potential FPs from the percentiled top DMOZ and wikipedia list:
http://spamcheck.freeapp.net/whitelists/percentile-wikipedia-dmoz-blocklist....
]
I went ahead and checked the remainder:
arena.ne.jp
1997 domain, web hosting site belonging to Japan's national telco NTT, no SBL, 72 NANAS - mostly sender addresses and a few abuse reporting addresses. 36 DMOZ hits. Probably should not be listed.
away.com
1995 domain, no SBL, 7 NANAS - all apparently joe job or "whitening" or "chaff" type false inclusions in spam. away.com appears to be a legitimate travel site with no actual abuse. Probably should not be listed.
centralhome.com
1998 domain, no SBL, no NANAS, May be a legitimate site: "Dance, Exercise, Sports and Fitness Videos, DVD, Books & Accessories". Probably should not be listed.
cheapass.com
1997 domain, no SBL, no NANAS, sells board games. Looks legitimate. Probably should not be listed.
kit.net
1997 domain, no SBL, 2000+ NANAS - user abuse of this hosting site. Redirects to kitnet.globo.com. Clearly this domain is abused. The only question is whether it has enough legitimate use to whitelist.
Hosting IP belongs to Embratel, and is listed in only SPEWS as a /24, which I don't consider particularly meaningful. IP is not listed in any other RBLs that openrbl.org knows about. Possibly ok to whitelist, though as others have noted, it does get abused a lot. 4 DMOZ hits.
kki.net.pl
1997 domain, no SBL, 248 NANAS - almost all in forged headers, appears to be a legitimate Polish ISP with mail and web hosting. Personal web sites probably subject to some abuse, but the NANAS hits were almost all forged mentions in mail headers. Appears to have legitimate uses. 15 DMOZ hits.
ledger-enquirer.com
1997 domain, no SBL, 1 NANAS in spam headers as forged recipient, which is usually meaningless. This is the web site of a local Georgia newspaper owned by large newspaper chain Knight Ridder. Almost certainly not a spam gang.
nana.co.il
2000 or earlier domain, no SBL, 754 NANAS - mostly forged headers, major Internet portal in Israel, appears to have legitimate uses. 40 DMOZ hits.
online-dictionary.biz
March 2004 domain registration, no SBL, no NANAS, mentioned as an online reference in Wikipedia as "free multi-lingual online dictionary between English and seven modern languages". Probably has legitimate uses despite the bizarre choice of TLD.
p5.org.uk
2001 domain belonging to portland.co.uk like 8bit.co.uk and some other free hosts we recently whitelisted, no SBL, 17 NANAS, appears to have some legitimate uses and some minor abuse. 2 DMOZ.
quuxuum.org
1996 domain, no SBL, 23 NANAS, all referring to "evan's" Bill Gates' net worth page, probably a Joe Job or chaff in contest scams. (Or some kind of sick, envy-driven justification in the scammers' puny brains for trying to scam people.) This site looks like a personal web server with some legitimate personal hobbyist uses. Does not seem to be a major spam destination or spammer, at least based on visible sites and NANAS mentions. 6 DMOZ.
republika.pl
1999 domain, no SBL, 12 NANAS, but 2526 DMOZ mentions, so it probably has far more legitimate uses than spam uses. Appears to be a Polish hosting provider. Probably should not be listed.
s5.com
1996 domain, no SBL, 116 NANAS, free hosting site belongs to freeservers.com. Already whitelisted along with others belonging to them.
spaceports.com
1997 domain, no SBL, 22 NANAS from 1999 through 2003. None in 2004. Hosting provider with reasonable looking abuse policies. The reports seem short-lived and few, indicating that they may be stopping abusers. 229 DMOZ hits. Probably should be whitelisted.
t35.com
1999 domain, no SBL, 59 NANAS, 44 DMO. Hosting provider. Seems to have reasonable abuse policies including specifically prohibiting spam mentions.
telepolis.com
1996 domain, no SBL, 33 NANAS, 262 DMOZ. Wanadoo Spain ISP. Minor spam from personal web or picture hosting. Spam-mentioned sites seem shut down, so probably has a functional abuse desk. Has some minor abuse, but probably should not be listed.
transnationale.org
1999 domain, no SBL, 250 NANAS, 3 DMOZ, French web site apparently tracking social policies of international companies. The NANAS hits are almost entirely mentions in 419-type spams, probably due to socio-political articles, but that does not make the domain spammers. Oddly it's the same article URI mentioned in every spam I checked, and that URI no longer serves a page. Perhaps the site owners got tired of the abuse and took it down. Again, this does not make the site spammers, more like victims of the mention. Probably should not be listed.
up.co.il
1998 or earlier domain, no SBL, 23 NANAS - mix of senders and hosts but relatively few, 48 DMOZ, Israel web hosting company. Some minor abuse, but probably should not be listed. Spam- mentioned sites seem shut down.
xiloo.com
2000 domain, in SBL, 345 NANAS, 8 DMOZ. Appears to be a China ISP or portal. Can't determine much more than that. Also owns xilu.com. Source on WS is:
/home/dbfunk/black-dbfunk-2:xiloo.com
Dave, got any data on them?
zip.net
1998 domain, not in SBL, 230 NANAS looking like abusive users.
Already whitelisted per Joe Wein's report:
"zip.net (http://zipmail.uol.com.br/) is a webmailer by UOL in Brazil."
zonai.com
1999 domain, no SBL, 88 NANAS, 2 DMOZ. Looks like Puerto Rico web portal. Appears to have legitimate uses. NANAS hits all appear to be 419-scam reply mentions and mail headers. Probably should not be listed.
So out of the above all should probably be whitelisted, except xiloo.com and kit.net for which I can't determine enough. Can any Chinese readers check out xiloo.com?
Does anyone have any comments on any of these?
OK I went ahead and whitelisted all of the above except for kit.net and xiloo.com .
I'd appreciate comments on those two or any of the others.
Jeff C. -- "If it appears in hams, then don't list it."
On Sat, 16 Oct 2004, Jeff Chan wrote:
xiloo.com
2000 domain, in SBL, 345 NANAS, 8 DMOZ. Appears to be a China ISP or portal. Can't determine much more than that. Also owns xilu.com. Source on WS is:
/home/dbfunk/black-dbfunk-2:xiloo.com
Dave, got any data on them?
The spam that I got in August referenced the URL: efriend.xiloo .com/0guang.htm
So that could be an abused personal portal.
Unfortunately the site is down right now and I cannot recheck it. (Don't remember exactly what I saw in August ;).
It's a low incidence hit, I don't mind yanking it if the consensus is for it.
On Saturday, October 16, 2004, 6:29:18 PM, David Funk wrote:
On Sat, 16 Oct 2004, Jeff Chan wrote:
xiloo.com
2000 domain, in SBL, 345 NANAS, 8 DMOZ. Appears to be a China ISP or portal. Can't determine much more than that. Also owns xilu.com. Source on WS is:
/home/dbfunk/black-dbfunk-2:xiloo.com
Dave, got any data on them?
The spam that I got in August referenced the URL: efriend.xiloo .com/0guang.htm
So that could be an abused personal portal.
Unfortunately the site is down right now and I cannot recheck it. (Don't remember exactly what I saw in August ;).
It's a low incidence hit, I don't mind yanking it if the consensus is for it.
Thanks Dave, www.xiloo comes up for me but efriend.xiloo does not. Perhaps they took down a spammer's site? If so that could be good.
But mainly I can't tell what the main site is since I can't read Chinese. It looks like a portal, but it's just a guess. And given that they seem to host multiple customer sites, perhaps we can assume they do hosting also.
Jeff C. -- "If it appears in hams, then don't list it."