[SURBL-Discuss] proxypots

Justin Mason jm at jmason.org
Wed Jun 16 12:39:26 CEST 2004


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Jeff Chan writes:
>We are also looking into some other potential spam
>URI data sources such as proxypots, etc.:
>
>  http://proxypot.org/

Jeff --

a quick note on this; it has to be done very carefully.  Many spammers are
using "link poisoning" stuff like this:

      Get ov<A
      href="http://www.gimbel.org"></A>er 300 medicat<B><FONT
      size=3>l</FONT></B>ons online sh<B><FONT size=3>l</FONT></B>pp<A
      href="http://www.omniscient.com"></A>ed over<A
      href="http://www.proton.net"></A>nig<A
      href="http://www.cravet.org"></A>ht to your fr<A
      href="http://www.aristotelean.org"></A>ont do<A
      href="http://www.barnacle.com"></A>or with no pr<A
      href="http://www.lordosis.net"></A>escr<B><FONT
      size=3>l</FONT></B>ption.</FONT>

All of those are "www.{RANDOMWORD}.{com|net|org}".   Eventually there's
one real link, which *is* SURBL-listed.  These are chaff.

Now, SORBS for one seems to be listing some of these sites; presumably
because they have a spamtrap-driven feed without enough human moderation.
That's the danger here.


(btw, there's arguments to be made that a better selection mechanism
can "weed those out", but that needs to be careful too.

- - Ignore .org/.net/.com?  spammer will use .biz, .info, and ccTLDs.
- - Ignore 0-length links (<a href=...></a>)?  spammer will change
  to use <a href=...>{RANDOMWORD}</a>.
- - Ignore "dictionary words" somehow?  spammer will use random URLs
  from google, so "real" sites.

so I don't think those approaches have much merit alone.)

- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFA0JPeQTcbUG5Y7woRAnYYAJ9/fZaT3WLmU+gT8aAnT2rcduDo7QCg6BE1
dF1r9ciWtFpEdC4OBHdRSKE=
=mnKX
-----END PGP SIGNATURE-----



More information about the Discuss mailing list