On Friday, September 10, 2004, 7:12:23 AM, Chris Santerre wrote:
From: Jeff Chan [mailto:jeffc@surbl.org]
I've extracted the plaintext * URI domains from a 14 GB ham corpus, taken the top 70th and 85th percentiles of the most frequently occurring domains and compared them against all SURBL domains, the master list of which can be found at:
http://spamcheck.freeapp.net/multi.domains.sort
At the 70th percentile level, there were only two matches:
automotivedigest.com processrequest.com
At the 85th percentile there were a few more:
automotivedigest.com chartshop.com ct002.com dakotaairparts.com hallogram.com infoaeroplan.ca investorsinsight.com processrequest.com sitepronews.com topachat.com
These are arguably false positives. What do we know about them. Should we whitelist or not whitelist any?
Nice work. I got none of these marked as spammers. I think sitepronews has caught my eye a few times, but not enough to be marked. Site pro also has:
* 1: allbusinessnews.com * 2: exactseek.com * 3: ezinehub.com * 4: goarticles.com * 5: novicenews.com * 6: sitepronews.com * 7: submitexpress.com * 8: zinehub.com
If there is a good thing about sitepronews, it's that they seem to send their mail through the same mail server which has an ezinehub.com reverse DNS record. Since they send from a consistent server, they can be trivially blocked on that mail server, as opposed to someone using zombied senders.
That said, since they seem to get mentioned in significant amounts of ham, I'm inclined to whitelist them.
Chartshop linked to: * 1: astrology.com * 2: astronet.com * 3: chartshop.com * 4: kweb.com
Thanks to Ryan setting up a GetURI run we can see that chartshop.com is about 6 years old. Astrology.com was registered in 1995. If these guys were consistent spammers I'd think they would have been shut down by now. Inclined to whitelist.
ct002 linked to (raises an eyebrow): * 1: 123banners.com * 2: 123greetings-inc.com * 3: 123greetings.com * 4: 123greetings.info * 5: ct002.com
ct002.com is less that a year old, but banners and greetings are from 1997. These guys seem less than clean, but do seem to appear in newsletters, etc. It may be better to whitelist than create some FPs. 3 NANAS on ct002.com
dakotaairports.com linked to: * 1: a250support.com * 2: avsupport.com * 3: dakotaairparts.com * 4: partslogistics.com
Aircraft logistics company with a 7 year old domain name. They are probably not major spammers. 4 NANAS.
investorsinsight.com not linked to anyone, but on more then a few peoples lists. However NANAS reports would have me believe they should NOT be listed. (Odd huh?)
They appear to use a consistent mail server which is not listed by spamhaus. Therefore, they're easily blocked without SURBLs if anyone doesn't want to get their messages. NANAS messages look like legitimate stock newsletters, but obviously some people didn't want to get them.
processrequest.com linked to: * 1: e2communications.com * 2: processrequest.com * 3: prq0.com Check http://tinyurl.com/4ds43 Just going to their website screams to me to watch them closely! If they are legit, they should be using SURBL to watch their own customers. They are a member of the evil empire DMA as well. In my jaded mind, thats an automatic block here at my company. Obviously different for SURBL. This one needs to be contacted and watched, IMHO.
topachat.com linked to: * 1: topachat-clust.com * 2: topachat.com They appear clean and possibly Joe Jobbed.
3 NANAS hits, some possibly abuse by their users. Their main site looks like a legit business.
Keep in mind, these lists are just good info. They shouldn't be used soely to determine their spammyness on their own. These lists are just to see who they are linkd to, and sometimes those links speak volumes. Like ct002 might need further investigation.
HTH someone.
--Chris
Thanks for your research help Chris, to which I'll add:
automotivedigest.com - 7 year old domain, automotive industry publication, zero NANAS
hallogram.com - 8 year old, zero NANAS, sells barcode equipment
infoaeroplan.ca - under 1 year old, zero NANAS, appears to run "Aeroplan Miles" program for Canadian telco Primus. Probably ham.
Some of these are somewhat grey, but since they also appear in some hand-classified ham, there are reasons to consider whitelisting them in addition to the above research. Therefore unless anyone has additional data, I'm inclined to whitelist them.
Comments anyone?
BTW, correction, the ham corpus I was using is 1.4 GB not 14.
Jeff C.