-----Original Message----- From: Jeff Chan [mailto:jeffc@surbl.org] Sent: Friday, September 10, 2004 2:40 AM To: SURBL Discuss Subject: [SURBL-Discuss] Large ham corpus hits against SURBLs
I've extracted the plaintext * URI domains from a 14 GB ham corpus, taken the top 70th and 85th percentiles of the most frequently occurring domains and compared them against all SURBL domains, the master list of which can be found at:
http://spamcheck.freeapp.net/multi.domains.sort
At the 70th percentile level, there were only two matches:
automotivedigest.com processrequest.com
At the 85th percentile there were a few more:
automotivedigest.com chartshop.com ct002.com dakotaairparts.com hallogram.com infoaeroplan.ca investorsinsight.com processrequest.com sitepronews.com topachat.com
These are arguably false positives. What do we know about them. Should we whitelist or not whitelist any?
- looking at plaintext has advantages and disadvantages:
- quick and easy
- does not "double or triple count" messages which also
have BASE 64 or quoted printable encoded versions of the same URIs 3. misses some such encoded URIs which don't have plaintext equivalents in a different part of the message
Nonetheless the data are still probably generally useful.
Nice work. I got none of these marked as spammers. I think sitepronews has caught my eye a few times, but not enough to be marked. Site pro also has:
* 1: allbusinessnews.com * 2: exactseek.com * 3: ezinehub.com * 4: goarticles.com * 5: novicenews.com * 6: sitepronews.com * 7: submitexpress.com * 8: zinehub.com
Chartshop linked to: * 1: astrology.com * 2: astronet.com * 3: chartshop.com * 4: kweb.com
ct002 linked to (raises an eyebrow): * 1: 123banners.com * 2: 123greetings-inc.com * 3: 123greetings.com * 4: 123greetings.info * 5: ct002.com
dakotaairports.com linked to: * 1: a250support.com * 2: avsupport.com * 3: dakotaairparts.com * 4: partslogistics.com
investorsinsight.com not linked to anyone, but on more then a few peoples lists. However NANAS reports would have me believe they should NOT be listed. (Odd huh?)
processrequest.com linked to: * 1: e2communications.com * 2: processrequest.com * 3: prq0.com Check http://tinyurl.com/4ds43 Just going to their website screams to me to watch them closely! If they are legit, they should be using SURBL to watch their own customers. They are a member of the evil empire DMA as well. In my jaded mind, thats an automatic block here at my company. Obviously different for SURBL. This one needs to be contacted and watched, IMHO.
topachat.com linked to: * 1: topachat-clust.com * 2: topachat.com They appear clean and possibly Joe Jobbed.
Keep in mind, these lists are just good info. They shouldn't be used soely to determine their spammyness on their own. These lists are just to see who they are linkd to, and sometimes those links speak volumes. Like ct002 might need further investigation.
HTH someone.
--Chris