Just an idea, I don't know if this have ever been discussed.
In the course of operating the surbl lists the realtime amount of requests for each listed domain (and each not listed domain as well) and the IP of servers using surbl to do the tests is known.
I don't have the data, but I suppose a spam run in progress should be easy to identify by the high number of requests for the spamvertized domain in a short period of time coming from a large number of geographically diverse mail servers.
Using that data, it should be possible to add an activity bit triggered when activity for the queried domain crosses a predefined threshold (the exact recipe would need extensive tweaking).
If such an activity bit is present, it should be possible to slightly lower the score for the other tests, using it as a 'score booster'. That way, the effect of a false positive, or a site generating so few tests they don't constitute a 'real' spam run would be lower, but detection score for an actively spamvertized site would increase.
also, since most legitimate mailing lists are to recipients in close geographic proximity, the geographic diversity of such lists should be very different when compared to a typical spam run. Such location pattern analysis could also be used (internally) as a warning for possible false positives. One step further, a 'spammy' query pattern on an unlisted domain might signal it should be investigated/listed.
Does it make sense ?
On 09/08/06, Eric Montréal erv@mailpeers.net wrote:
also, since most legitimate mailing lists are to recipients in close geographic proximity,
Care to quote your data source for this assumption? Your deffinition of 'most' and of 'close proximity'?
Peter Bowyer wrote:
On 09/08/06, Eric Montréal erv@mailpeers.net wrote:
also, since most legitimate mailing lists are to recipients in close geographic proximity,
Legitimate mailing lists would include this one, the SA users list and numerous industry lists covering every possible topic from linguistics to engineering and marketing.
I get mail from Microsoft that they send to all their partners worldwide.
Maybe "geographic proximity" is relative to the size of the universe?
Mr Michele Neylon Blacknight Solutions Hosting & Colocation, Brand Protection http://www.blacknight.ie/ http://blog.blacknight.ie/ Tel. 1850 927 280 Intl. +353 (0) 59 9183072 UK: 0870 163 0607 Direct Dial: +353 (0)59 9183090 Fax. +353 (0) 59 9164239
Michele Neylon :: Blacknight Solutions a écrit :
Peter Bowyer wrote:
On 09/08/06, Eric Montréal erv@mailpeers.net wrote:
also, since most legitimate mailing lists are to recipients in close geographic proximity,
Legitimate mailing lists would include this one, the SA users list and numerous industry lists covering every possible topic from linguistics to engineering and marketing.
I get mail from Microsoft that they send to all their partners worldwide.
Maybe "geographic proximity" is relative to the size of the universe?
Looks like something else is the size of the universe ...
Major lists whose distribution is to as many different servers as a spam run have little chance to be sent from a domain listed in surbl.
When was the last time Microsoft got listed in surbl ?
Smaller lists might end up being sent from a false positive domain and the idea is that surbl test pattern (queries/minutes, burst/continuous, historical comparisons, geolocation and perhaps other metrics) should allow to differentiate between such a list and a spam run.
An antispam service such as surbl does have a far more complete picture on a global scale than anyone operating some mail servers. The access pattern such a service will see is mirroring major spam runs, and this could be exploited. That was the basic idea.
On Wednesday, August 9, 2006, 3:20:58 PM, Eric Montréal wrote:
On 09/08/06, Eric Montréal erv@mailpeers.net wrote:
also, since most legitimate mailing lists are to recipients in close geographic proximity,
Major lists whose distribution is to as many different servers as a spam run have little chance to be sent from a domain listed in surbl.
When was the last time Microsoft got listed in surbl ?
Smaller lists might end up being sent from a false positive domain and the idea is that surbl test pattern (queries/minutes, burst/continuous, historical comparisons, geolocation and perhaps other metrics) should allow to differentiate between such a list and a spam run.
An antispam service such as surbl does have a far more complete picture on a global scale than anyone operating some mail servers. The access pattern such a service will see is mirroring major spam runs, and this could be exploited. That was the basic idea.
It's an interesting idea. Does anyone have any research or references about the geographic distribution of spam versus ham? Presumably it's been studied.
Surely there is some ham that's sent pretty much without regard to geographic boundaries. After all, the Internet does include some global interests (other than pills, warez, mortgages, etc.).
Jeff C. -- Don't harm innocent bystanders.
On 8/9/2006 10:03 PM, Jeff Chan wrote:
It's an interesting idea. Does anyone have any research or references about the geographic distribution of spam versus ham? Presumably it's been studied.
I don't know of any research data, but I'd assumed that Dallas is doing this with URIBL. I'd always thought it'd be an obvious thing to do for someone creating and serving their own list data.
Daryl
Peter Bowyer a écrit :
On 09/08/06, Eric Montréal erv@mailpeers.net wrote:
also, since most legitimate mailing lists are to recipients in close geographic proximity,
Care to quote your data source for this assumption? Your deffinition of 'most' and of 'close proximity'?
Obviously, I'm not the one running surbl, how could I already have the data ?
The point was that, except for very large lists sent from domains that will never be listed by surbl in the first place, most (that means a statistically significant portion) should generate surbl traffic patterns different enough to allow distinction between such a list an a spam list whose recipients are located all around the world and would generate a high number of requests, from very diverse places.
The idea was that data mining in surbl logs (or other RBL / URI services queried by a large number of servers) might enhance accuracy by allowing accurate realtime detection of spams in progress. I might be wrong, or maybe it's not surbl's role to do such analysis.
Hi!
The idea was that data mining in surbl logs (or other RBL / URI services by a large number of servers) might enhance accuracy by allowing accurate realtime detection of spams in progress. I might be wrong, or maybe it's not surbl's role to do such analysis.
We allready do....
Bye, Raymond.