From jm@jmason.org Wed Jun 16 20:39:51 2004 From: jm@jmason.org To: discuss@lists.surbl.org Subject: [SURBL-Discuss] proxypots Date: Wed, 16 Jun 2004 11:39:26 -0700 Message-ID: <20040616183927.4CECE590006@radish.jmason.org> In-Reply-To: <1068018218.20040616015831@supranet.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============6518460581490341008==" --===============6518460581490341008== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Jeff Chan writes: >We are also looking into some other potential spam >URI data sources such as proxypots, etc.: > > http://proxypot.org/ Jeff -- a quick note on this; it has to be done very carefully. Many spammers are using "link poisoning" stuff like this: Get over 300 medicatlons online shlpped overnight to your front door with no prescrlption. All of those are "www.{RANDOMWORD}.{com|net|org}". Eventually there's one real link, which *is* SURBL-listed. These are chaff. Now, SORBS for one seems to be listing some of these sites; presumably because they have a spamtrap-driven feed without enough human moderation. That's the danger here. (btw, there's arguments to be made that a better selection mechanism can "weed those out", but that needs to be careful too. - - Ignore .org/.net/.com? spammer will use .biz, .info, and ccTLDs. - - Ignore 0-length links ()? spammer will change to use {RANDOMWORD}. - - Ignore "dictionary words" somehow? spammer will use random URLs from google, so "real" sites. so I don't think those approaches have much merit alone.) - --j. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFA0JPeQTcbUG5Y7woRAnYYAJ9/fZaT3WLmU+gT8aAnT2rcduDo7QCg6BE1 dF1r9ciWtFpEdC4OBHdRSKE= =mnKX -----END PGP SIGNATURE----- --===============6518460581490341008==--