Good morning, Jeff, all,
On Sat, 10 Apr 2004, Jeff Chan wrote:
BTW, Kelsey and I brainstormed last night and I think we have a way to effectively prejudice new domain reports coming in from SpamCop without reference to SBL or to geographic databases like IP::Country::Fast or any other external sources like I had in mind originally.
It's so simple that I might be tempted to call it elegant:
- Resolve the incoming spam domains from SC into A and
perhaps NS records.
NS is a little tougher. I could see us hurting people that left their name service with their registrar or large ISP's that simply host a lot of domains. It would also be possible for a spammer to simply claim that ns1.earthlink.net and ns2.aol.com are their secondary and tertiary when they aren't. It's not possible - or at least tougher - to fake A records.
- Keep a persistent tally counting those IPs. (a history)
- For As or NSes of incoming domains that match many identical
or nearby IP tallies (i.e., the new domains use known bad old IPs), drop their inclusion thresholds in some statistically cool and relevant way.
To our thinking, this will automatically and in a self-tuning way catch spam gangs, rogue IPs, rogue blocks, rogue ISPs in any nation, etc. (Manually resolving some of the domains in spams I get seem to show China and a few gangs a lot. I'd dearly like to crush them early and often. Building this refinement into the second version of the sc.surbl.org data engine may very well do that.)
The big advantage is that far fewer reports would be needed for a *new* domain to get added to the list if it has an IP near previously reported domain's IPs. We would expire IPs like domains, but probably with a longer time window for IPs, so that cleaned IPs would eventually come off the tallies.
To clarify, the IPs would not get added to any lists, just get used internally to lower the inclusion threshold for the number of SpamCop reports needed to get added. Inclusion would still be triggered by SpamCop reports, but in a more sensitive way for bad guy IPs.
Seems almost too good to be true. Am I missing something?
The approach made sense to me as well. http://www.stearns.org/sa-blacklist/spamip.current.txt is the report created from the A record harvesting. I'd be glad to provide the raw data collected over the past 5 months. I also have SOA and whois data for the entire sa-blacklist. Cheers, - Bill
--------------------------------------------------------------------------- "There are two kinds of people, those who work and those who want the credit. Try to stay in the first category. The competition is much smaller." -- Mahatma Ghandi -------------------------------------------------------------------------- William Stearns (wstearns@pobox.com). Mason, Buildkernel, freedups, p0f, rsync-backup, ssh-keyinstall, dns-check, more at: http://www.stearns.org --------------------------------------------------------------------------