[SURBL-Discuss] proxypots

Jeff Chan jeffc at surbl.org
Wed Jun 16 13:16:25 CEST 2004


On Wednesday, June 16, 2004, 11:39:26 AM, Justin Mason wrote:
> a quick note on this; it has to be done very carefully.  Many spammers are
> using "link poisoning" stuff like this:

>       Get ov<A
>       href="http://www.gimbel.org"></A>er 300 medicat<B><FONT
>       size=3>l</FONT></B>ons online sh<B><FONT size=3>l</FONT></B>pp<A
>       href="http://www.omniscient.com"></A>ed over<A
>       href="http://www.proton.net"></A>nig<A
>       href="http://www.cravet.org"></A>ht to your fr<A
>       href="http://www.aristotelean.org"></A>ont do<A
>       href="http://www.barnacle.com"></A>or with no pr<A
>       href="http://www.lordosis.net"></A>escr<B><FONT
>       size=3>l</FONT></B>ption.</FONT>

On Wednesday, June 16, 2004, 11:45:03 AM, Raymond Dijkxhoorn wrote:
(Justin wrote:)
>> All of those are "www.{RANDOMWORD}.{com|net|org}".   Eventually there's
>> one real link, which *is* SURBL-listed.  These are chaff.
>> 
>> Now, SORBS for one seems to be listing some of these sites; presumably
>> because they have a spamtrap-driven feed without enough human moderation.
>> That's the danger here.

Yes, I agree poisoning could definitely be a problem.  Thanks
for the confirmation of that.  I'm not going to rush into this or
do anything without a lot of care.  If a method is unsound, we
won't pick it up.

>> (btw, there's arguments to be made that a better selection mechanism
>> can "weed those out", but that needs to be careful too.
>> 
>> - - Ignore .org/.net/.com?  spammer will use .biz, .info, and ccTLDs.
>> - - Ignore 0-length links (<a href=...></a>)?  spammer will change
>>   to use <a href=...>{RANDOMWORD}</a>.
>> - - Ignore "dictionary words" somehow?  spammer will use random URLs
>>   from google, so "real" sites.
>> 
>> so I don't think those approaches have much merit alone.)

Agreed.

I was going to propose taking the top Nth percentile of reports,
hopefully from a large base of pots, but a large poisoner could
break into that too.

Another approach, which Outblaze apparently applies to their
domains to block on is to only list domains that have been
registered within the last 90 days.  The principle is that the
newness is a good partial predictor of spammyness and that
could have some value.

All of the above may not be enough to obtain good results
automatically, mainly due to the poisoning problem you
mention.

> Its 'just' a extra source, ... on mu pot i found a couple domains that 
> were indeed spammer domains but not listed yet. It involves some manual 
> action but i think its nice additions.

Hand-checking could make it feasible.

Jeff C.



More information about the Discuss mailing list