[SURBL-Discuss] Large ham corpus hits against SURBLs

Jeff Chan jeffc at surbl.org
Sat Sep 11 09:40:22 CEST 2004


On Friday, September 10, 2004, 7:12:23 AM, Chris Santerre wrote:
>>From: Jeff Chan [mailto:jeffc at surbl.org]

>>I've extracted the plaintext * URI domains from a 14 GB ham corpus,
>>taken the top 70th and 85th percentiles of the most frequently
>>occurring domains and compared them against all SURBL domains,
>>the master list of which can be found at:
>>
>>  http://spamcheck.freeapp.net/multi.domains.sort
>>
>>At the 70th percentile level, there were only two matches:
>>
>>  automotivedigest.com
>>  processrequest.com
>>
>>At the 85th percentile there were a few more:
>>
>>  automotivedigest.com
>>  chartshop.com
>>  ct002.com
>>  dakotaairparts.com
>>  hallogram.com
>>  infoaeroplan.ca
>>  investorsinsight.com
>>  processrequest.com
>>  sitepronews.com
>>  topachat.com
>>
>>These are arguably false positives.  What do we know about them.
>>Should we whitelist or not whitelist any?

> Nice work. I got none of these marked as spammers. I think sitepronews has
> caught my eye a few times, but not enough to be marked. Site pro also has:

>     * 1: allbusinessnews.com
>     * 2: exactseek.com
>     * 3: ezinehub.com
>     * 4: goarticles.com
>     * 5: novicenews.com
>     * 6: sitepronews.com
>     * 7: submitexpress.com
>     * 8: zinehub.com

If there is a good thing about sitepronews, it's that they seem
to send their mail through the same mail server which has an
ezinehub.com reverse DNS record.  Since they send from a
consistent server, they can be trivially blocked on that mail
server, as opposed to someone using zombied senders.

That said, since they seem to get mentioned in significant
amounts of ham, I'm inclined to whitelist them.

> Chartshop linked to:
>     * 1: astrology.com
>     * 2: astronet.com
>     * 3: chartshop.com
>     * 4: kweb.com

Thanks to Ryan setting up a GetURI run we can see that
chartshop.com is about 6 years old.  Astrology.com was
registered in 1995.  If these guys were consistent spammers
I'd think they would have been shut down by now.  Inclined
to whitelist.

> ct002 linked to (raises an eyebrow):
>     * 1: 123banners.com
>     * 2: 123greetings-inc.com
>     * 3: 123greetings.com
>     * 4: 123greetings.info
>     * 5: ct002.com

ct002.com is less that a year old, but banners and greetings are
from 1997.  These guys seem less than clean, but do seem to
appear in newsletters, etc.  It may be better to whitelist
than create some FPs.   3 NANAS on ct002.com

> dakotaairports.com linked to:
>     * 1: a250support.com
>     * 2: avsupport.com
>     * 3: dakotaairparts.com
>     * 4: partslogistics.com

Aircraft logistics company with a 7 year old domain name.
They are probably not major spammers.   4 NANAS.

> investorsinsight.com not linked to anyone, but on more then a few peoples
> lists. However NANAS reports would have me believe they should NOT be
> listed. (Odd huh?)

They appear to use a consistent mail server which is not listed
by spamhaus.  Therefore, they're easily blocked without SURBLs
if anyone doesn't want to get their messages.  NANAS messages
look like legitimate stock newsletters, but obviously some people
didn't want to get them.

> processrequest.com linked to:
>     * 1: e2communications.com
>     * 2: processrequest.com
>     * 3: prq0.com
> Check http://tinyurl.com/4ds43
> Just going to their website screams to me to watch them closely! If they are
> legit, they should be using SURBL to watch their own customers. They are a
> member of the evil empire DMA as well. In my jaded mind, thats an automatic
> block here at my company. Obviously different for SURBL. This one needs to
> be contacted and watched, IMHO.

> topachat.com linked to:
>     * 1: topachat-clust.com
>     * 2: topachat.com
> They appear clean and possibly Joe Jobbed.

3 NANAS hits, some possibly abuse by their users.  Their main
site looks like a legit business.

> Keep in mind, these lists are just good info. They shouldn't be used soely
> to determine their spammyness on their own. These lists are just to see who
> they are linkd to, and sometimes those links speak volumes. Like ct002 might
> need further investigation. 

> HTH someone.

> --Chris

Thanks for your research help Chris, to which I'll add:

automotivedigest.com - 7 year old domain, automotive industry
publication, zero NANAS

hallogram.com - 8 year old, zero NANAS, sells barcode equipment

infoaeroplan.ca - under 1 year old, zero NANAS, appears to run
"Aeroplan Miles" program for Canadian telco Primus. Probably
ham.

Some of these are somewhat grey, but since they also appear
in some hand-classified ham, there are reasons to consider
whitelisting them in addition to the above research.  Therefore
unless anyone has additional data, I'm inclined to whitelist them.

Comments anyone?


BTW, correction, the ham corpus I was using is 1.4 GB not 14.

Jeff C.



More information about the Discuss mailing list