RFC: Public false positve reporting form

List overview All Threads
Download

newer

older

Perfect example!

RE: [SURBL-Discuss] Proposing a...

Jeff Chan

30 Aug 2004 30 Aug '04

1:33 p.m.

How does anyone feel about a public false positive reporting form? Such a form would include the reporting party, the domain(s) or IPs, reasons why it's a false positive, etc.

It could be processed publically or privately in a bug tracking system. It would be hand checked by some of our experienced spam fighters. The results could be published or simply aggregated without public announcement into the overall whitelist.

Is this a good or bad idea?

What level of publication should there be?

What forms of proof (if any) does anyone like?

Other comments?

Jeff C.

Show replies by date

Rob McEwen

30 Aug 30 Aug

2:57 p.m.

Jeff:

I think that there are two areas which present particular challenges.

(1) e-mail marketers who play it "both ways"... thus making it hard to use SURBL to catch their bad behavior without blocking legitimate mail

...AND...

(2) savy spammers who manage to get significant amounts through in the first few minutes/hours BEFORE getting blocked by SURBL... in particular, the ones who already use the best strategies to get around all other types of filtering.

The quicker TTLs is helping with the savy spammers. Also, I recall something about a newer version of SURBL which will use some kind of tracking to trace new domains back to older ones in order to attributing new spam to known and confirmed spammers so that they would stay "attached" to their previous bad records in order to blacklist them faster. What ever became of this? (Did I explain this correctly?)

Anyway... even when these are done, we will STILL have some problems with the most savy spammers.

Also, I think that a lot of people fear that, as we work towards eliminating the rest of the FPs, more and more spam from these e-mail marketers who play it "both ways" will get through and the overall catch rate for SURBL may drop by 10 or 20 percent (or whatever).

I'm willing to live with that... (gulp!)

BUT... I think that it would be great to integrate into a formal tracking system a way to categorize URIs into either or both of these groups. ("SavySpams" and "GrayMarketer" ...or whatever) That way, we can use this data to help us form better "rules" in our linguistic/heuristic filters. The idea being that, at this point, the amount of spam that is getting through is much more focused than a large general pool of spam. This more narrow focus should give us the tools to close any loopholes that SURBL might not catch.

I would also suggest that if a message's server address is already blocked by BOTH list.dsbl.org AND sbl-xbl.spamhaus.org, then it shouldn't be added to this particular list for the sake of keeping the list focused. It seems that, whatever the disagreements about RLBs are, I think that EVERYONE would agree with this particular standard as being a reliable (yet FP safe and conservative) standard for RBL blocking.

I envision a "Gray page" which would list the top 10 offenders of Graymarketers who are bad enough to be mentioned, but not bad enough to get listed by SURBL and the top 10 "savy spamers" who are known to periodically (abet temporarily) beat SURBL with their new domains. Subsequent offenders could be listed on following pages after the top 10 for each of these two categories. Each listing would include a link to more info about this spammer or series of spam. This more info page would also included samples of spam that hit real spamtraps (made anonomous), and, for the gray marketers, samples of legitimate mail with that particular URI.

FP-safe rules would also be suggested...

NOW... who would have the time to get all this together??? :)

Rob McEwen

Steven Champeon

5:22 p.m.

on Mon, Aug 30, 2004 at 08:57:53AM -0400, Rob McEwen wrote:

...

(2) savy spammers who manage to get significant amounts through in the first few minutes/hours BEFORE getting blocked by SURBL... in particular, the ones who already use the best strategies to get around all other types of filtering.

Yeah, like 'Sergey Katchenko' or 'Ivan Drozdof'; based on the joe job bounces I'm seeing here, he's using multiple new-to-me domains in a given day's spam run, and differentiating them based on whose domains are being forged into the spamrun. So, neverexisted@dhtml-guis.com might send spam pointing to alexkardonMUNGED.com, and neverexisted@otherdomain might send spam with yamatotakeruMUNGED.com, etc. All in the same overnight run.

Fortunately, he's also using a fairly easy to crack sender forging script, so once I figure out his wordlist (it's more sophisticated than the words/words2 that comes with Linux/OSX) he'll be gone from here, anyway.

-- hesketh.com/inc. v: +1(919)834-2552 f: +1(919)834-2554 w: http://hesketh.com Buy "Cascading Style Sheets: Separating Content from Presentation, 2/e" today! http://www.amazon.com/exec/obidos/ASIN/159059231X/heskecominc-20/ref=nosim/

Jeff Chan

4 Sep 4 Sep

12:31 a.m.

On Monday, August 30, 2004, 5:57:53 AM, Rob McEwen wrote:

...

The quicker TTLs is helping with the savy spammers. Also, I recall something about a newer version of SURBL which will use some kind of tracking to trace new domains back to older ones in order to attributing new spam to known and confirmed spammers so that they would stay "attached" to their previous bad records in order to blacklist them faster. What ever became of this? (Did I explain this correctly?)

Yes, it's on my list of things to do. I will get to it eventually and the new system could be a basis for generalized spam domain feed handling. Where there is a real time feed of spam domains, as from SpamCop or perhaps good spamtraps, the reports can like be a voting system where more reports means more likely spam.

Basing this on human reports, as from SpamCop, is probably best since automation usually leads to too many FPs.

The idea is to take the top N percentile (probably 85th percentile) of reports, thus hopefully leaving behind the FPs in the low-level "noisy" bottom 15th percentile.

In addition the IP address of the domain will be resolved and "remembered". Future domains that resolve to the same IPs will probably inherit the report counts of the whole IP range or some fraction thereof. That will catch spammers whose web sites are on the same IPs or range of IPs.

By the way we have lowered the TTLs on the lists to 1 hour, then 25 minutes, now 20 minutes. We may try 15 minutes also. We will pick the TTL that minimizes both DNS traffic and TTL. (At some TTL point the DNS traffic will ramp up significantly; from which we will back off one time period. We already know it's quite a bit higher at 10 minutes, but not much higher at 20 minutes, so that point is somewhere in between.) We are using real world testing to find the optimum TTL for our type of data and application.

Jeff C.

Alex Broens

30 Aug 30 Aug

3:01 p.m.

----- Original Message ----- From: "Jeff Chan" jeffc@surbl.org To: "SURBL Discuss" discuss@lists.surbl.org Sent: Monday, August 30, 2004 1:33 PM Subject: [SURBL-Discuss] RFC: Public false positve reporting form

...

How does anyone feel about a public false positive reporting form? Such a form would include the reporting party, the domain(s) or IPs, reasons why it's a false positive, etc.

Good idea...

...

It could be processed publically or privately in a bug tracking system. It would be hand checked by some of our experienced spam fighters. The results could be published or simply aggregated without public announcement into the overall whitelist.

Privately

...