-----Original Message----- From: Jeff Chan [mailto:jeffc@surbl.org] Sent: Friday, September 10, 2004 2:40 AM To: SURBL Discuss Subject: [SURBL-Discuss] Large ham corpus hits against SURBLs
I've extracted the plaintext * URI domains from a 14 GB ham corpus, taken the top 70th and 85th percentiles of the most frequently occurring domains and compared them against all SURBL domains, the master list of which can be found at:
http://spamcheck.freeapp.net/multi.domains.sort
At the 70th percentile level, there were only two matches:
automotivedigest.com processrequest.com
At the 85th percentile there were a few more:
automotivedigest.com chartshop.com ct002.com dakotaairparts.com hallogram.com infoaeroplan.ca investorsinsight.com processrequest.com sitepronews.com topachat.com
These are arguably false positives. What do we know about them. Should we whitelist or not whitelist any?
- looking at plaintext has advantages and disadvantages:
- quick and easy
- does not "double or triple count" messages which also
have BASE 64 or quoted printable encoded versions of the same URIs 3. misses some such encoded URIs which don't have plaintext equivalents in a different part of the message
Nonetheless the data are still probably generally useful.
Nice work. I got none of these marked as spammers. I think sitepronews has caught my eye a few times, but not enough to be marked. Site pro also has:
* 1: allbusinessnews.com * 2: exactseek.com * 3: ezinehub.com * 4: goarticles.com * 5: novicenews.com * 6: sitepronews.com * 7: submitexpress.com * 8: zinehub.com
Chartshop linked to: * 1: astrology.com * 2: astronet.com * 3: chartshop.com * 4: kweb.com
ct002 linked to (raises an eyebrow): * 1: 123banners.com * 2: 123greetings-inc.com * 3: 123greetings.com * 4: 123greetings.info * 5: ct002.com
dakotaairports.com linked to: * 1: a250support.com * 2: avsupport.com * 3: dakotaairparts.com * 4: partslogistics.com
investorsinsight.com not linked to anyone, but on more then a few peoples lists. However NANAS reports would have me believe they should NOT be listed. (Odd huh?)
processrequest.com linked to: * 1: e2communications.com * 2: processrequest.com * 3: prq0.com Check http://tinyurl.com/4ds43 Just going to their website screams to me to watch them closely! If they are legit, they should be using SURBL to watch their own customers. They are a member of the evil empire DMA as well. In my jaded mind, thats an automatic block here at my company. Obviously different for SURBL. This one needs to be contacted and watched, IMHO.
topachat.com linked to: * 1: topachat-clust.com * 2: topachat.com They appear clean and possibly Joe Jobbed.
Keep in mind, these lists are just good info. They shouldn't be used soely to determine their spammyness on their own. These lists are just to see who they are linkd to, and sometimes those links speak volumes. Like ct002 might need further investigation.
HTH someone.
--Chris
On Friday, September 10, 2004, 7:12:23 AM, Chris Santerre wrote:
From: Jeff Chan [mailto:jeffc@surbl.org]
I've extracted the plaintext * URI domains from a 14 GB ham corpus, taken the top 70th and 85th percentiles of the most frequently occurring domains and compared them against all SURBL domains, the master list of which can be found at:
http://spamcheck.freeapp.net/multi.domains.sort
At the 70th percentile level, there were only two matches:
automotivedigest.com processrequest.com
At the 85th percentile there were a few more:
automotivedigest.com chartshop.com ct002.com dakotaairparts.com hallogram.com infoaeroplan.ca investorsinsight.com processrequest.com sitepronews.com topachat.com
These are arguably false positives. What do we know about them. Should we whitelist or not whitelist any?
Nice work. I got none of these marked as spammers. I think sitepronews has caught my eye a few times, but not enough to be marked. Site pro also has:
* 1: allbusinessnews.com * 2: exactseek.com * 3: ezinehub.com * 4: goarticles.com * 5: novicenews.com * 6: sitepronews.com * 7: submitexpress.com * 8: zinehub.com
If there is a good thing about sitepronews, it's that they seem to send their mail through the same mail server which has an ezinehub.com reverse DNS record. Since they send from a consistent server, they can be trivially blocked on that mail server, as opposed to someone using zombied senders.
That said, since they seem to get mentioned in significant amounts of ham, I'm inclined to whitelist them.
Chartshop linked to: * 1: astrology.com * 2: astronet.com * 3: chartshop.com * 4: kweb.com
Thanks to Ryan setting up a GetURI run we can see that chartshop.com is about 6 years old. Astrology.com was registered in 1995. If these guys were consistent spammers I'd think they would have been shut down by now. Inclined to whitelist.
ct002 linked to (raises an eyebrow): * 1: 123banners.com * 2: 123greetings-inc.com * 3: 123greetings.com * 4: 123greetings.info * 5: ct002.com
ct002.com is less that a year old, but banners and greetings are from 1997. These guys seem less than clean, but do seem to appear in newsletters, etc. It may be better to whitelist than create some FPs. 3 NANAS on ct002.com
dakotaairports.com linked to: * 1: a250support.com * 2: avsupport.com * 3: dakotaairparts.com * 4: partslogistics.com
Aircraft logistics company with a 7 year old domain name. They are probably not major spammers. 4 NANAS.
investorsinsight.com not linked to anyone, but on more then a few peoples lists. However NANAS reports would have me believe they should NOT be listed. (Odd huh?)
They appear to use a consistent mail server which is not listed by spamhaus. Therefore, they're easily blocked without SURBLs if anyone doesn't want to get their messages. NANAS messages look like legitimate stock newsletters, but obviously some people didn't want to get them.
processrequest.com linked to: * 1: e2communications.com * 2: processrequest.com * 3: prq0.com Check http://tinyurl.com/4ds43 Just going to their website screams to me to watch them closely! If they are legit, they should be using SURBL to watch their own customers. They are a member of the evil empire DMA as well. In my jaded mind, thats an automatic block here at my company. Obviously different for SURBL. This one needs to be contacted and watched, IMHO.
topachat.com linked to: * 1: topachat-clust.com * 2: topachat.com They appear clean and possibly Joe Jobbed.
3 NANAS hits, some possibly abuse by their users. Their main site looks like a legit business.
Keep in mind, these lists are just good info. They shouldn't be used soely to determine their spammyness on their own. These lists are just to see who they are linkd to, and sometimes those links speak volumes. Like ct002 might need further investigation.
HTH someone.
--Chris
Thanks for your research help Chris, to which I'll add:
automotivedigest.com - 7 year old domain, automotive industry publication, zero NANAS
hallogram.com - 8 year old, zero NANAS, sells barcode equipment
infoaeroplan.ca - under 1 year old, zero NANAS, appears to run "Aeroplan Miles" program for Canadian telco Primus. Probably ham.
Some of these are somewhat grey, but since they also appear in some hand-classified ham, there are reasons to consider whitelisting them in addition to the above research. Therefore unless anyone has additional data, I'm inclined to whitelist them.
Comments anyone?
BTW, correction, the ham corpus I was using is 1.4 GB not 14.
Jeff C.
On Saturday, September 11, 2004, 12:40:22 AM, Jeff Chan wrote:
On Friday, September 10, 2004, 7:12:23 AM, Chris Santerre wrote:
processrequest.com linked to: * 1: e2communications.com * 2: processrequest.com * 3: prq0.com Check http://tinyurl.com/4ds43 Just going to their website screams to me to watch them closely! If they are legit, they should be using SURBL to watch their own customers. They are a member of the evil empire DMA as well. In my jaded mind, thats an automatic block here at my company. Obviously different for SURBL. This one needs to be contacted and watched, IMHO.
I missed commenting on these guys. They do look quite spammy, but they've been around since 1996 and their Exodus IP addresses which they send from and host at are not listed by spamhaus.
I don't remember what Exodus' abuse policies are like, but I think it's safe to assume they're at least a little stricter than say China Telecom. The age of their registration and the fact that they're hosted at a decent ISP leads me to think they may be ok to whitelist.
They also send mail from a consistent IP address of 216.39.67.122 which is easily blocked.
Does anyone have other ham hit examples for them?
Jeff C.
Looking at the hams, it looks like some very legitimate companies use ProcessRequest to send their mailing lists. When those mailings go out they often have links back to ProcessRequest.com
So unfortunately it looks like we can't list these guys.
Jeff C.
On Sat, 11 Sep 2004 00:40:22 -0700, Jeff Chan jeffc@surbl.org wrote:
On Friday, September 10, 2004, 7:12:23 AM, Chris Santerre wrote:
From: Jeff Chan [mailto:jeffc@surbl.org]
dakotaairports.com linked to: * 1: a250support.com * 2: avsupport.com * 3: dakotaairparts.com * 4: partslogistics.com
Aircraft logistics company with a 7 year old domain name. They are probably not major spammers. 4 NANAS.
Definate HAM - one of our clients uses these guys heavily.
investorsinsight.com not linked to anyone, but on more then a few peoples lists. However NANAS reports would have me believe they should NOT be listed. (Odd huh?)
They appear to use a consistent mail server which is not listed by spamhaus. Therefore, they're easily blocked without SURBLs if anyone doesn't want to get their messages. NANAS messages look like legitimate stock newsletters, but obviously some people didn't want to get them.
These guys are whitelist worthy, however their mailings can appear quite spammy.
topachat.com linked to: * 1: topachat-clust.com * 2: topachat.com They appear clean and possibly Joe Jobbed.
3 NANAS hits, some possibly abuse by their users. Their main site looks like a legit business.
Definate ham, we have ham reports to back this.
On Saturday, September 11, 2004, 12:40:22 AM, Jeff Chan wrote:
At the 85th percentile there were a few more:
automotivedigest.com chartshop.com ct002.com dakotaairparts.com hallogram.com infoaeroplan.ca investorsinsight.com processrequest.com sitepronews.com topachat.com
OK I've whitelisted these 85th percentile large ham corpus hits after some checking by Joe Wein, Chris, David Hooton and me.
If anyone has any further comments on these, please post or email me off list.
Jeff C.