proxypots

List overview All Threads
Download

newer

older

squid?

Local RBL list?

jm＠jmason.org

16 Jun 2004 16 Jun '04

8:39 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Jeff Chan writes:

...

We are also looking into some other potential spam URI data sources such as proxypots, etc.:

http://proxypot.org/

Jeff --

a quick note on this; it has to be done very carefully. Many spammers are using "link poisoning" stuff like this:

Get ov<A href="http://www.gimbel.org"></A>er 300 medicatlons online shlpp<A href="http://www.omniscient.com"></A>ed over<A href="http://www.proton.net"></A>nig<A href="http://www.cravet.org"></A>ht to your fr<A href="http://www.aristotelean.org"></A>ont do<A href="http://www.barnacle.com"></A>or with no pr<A href="http://www.lordosis.net"></A>escrlption.

All of those are "www.{RANDOMWORD}.{com|net|org}". Eventually there's one real link, which *is* SURBL-listed. These are chaff.

Now, SORBS for one seems to be listing some of these sites; presumably because they have a spamtrap-driven feed without enough human moderation. That's the danger here.

(btw, there's arguments to be made that a better selection mechanism can "weed those out", but that needs to be careful too.

- - Ignore .org/.net/.com? spammer will use .biz, .info, and ccTLDs. - - Ignore 0-length links (<a href=...></a>)? spammer will change to use <a href=...>{RANDOMWORD}</a>. - - Ignore "dictionary words" somehow? spammer will use random URLs from google, so "real" sites.

so I don't think those approaches have much merit alone.)

- --j.

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFA0JPeQTcbUG5Y7woRAnYYAJ9/fZaT3WLmU+gT8aAnT2rcduDo7QCg6BE1 dF1r9ciWtFpEdC4OBHdRSKE= =mnKX -----END PGP SIGNATURE-----

Show replies by date

Raymond Dijkxhoorn

16 Jun 16 Jun

8:45 p.m.

Hi Justin,

...

All of those are "www.{RANDOMWORD}.{com|net|org}". Eventually there's one real link, which *is* SURBL-listed. These are chaff.

Now, SORBS for one seems to be listing some of these sites; presumably because they have a spamtrap-driven feed without enough human moderation. That's the danger here.

(btw, there's arguments to be made that a better selection mechanism can "weed those out", but that needs to be careful too.

Ignore .org/.net/.com? spammer will use .biz, .info, and ccTLDs.

Ignore 0-length links (<a href=...></a>)? spammer will change

to use <a href=...>{RANDOMWORD}</a>.

Ignore "dictionary words" somehow? spammer will use random URLs

from google, so "real" sites.

so I don't think those approaches have much merit alone.)

Its 'just' a extra source, ... on mu pot i found a couple domains that were indeed spammer domains but not listed yet. It involves some manual action but i think its nice additions.

Bye, Raymond.

Jeff Chan

9:16 p.m.

On Wednesday, June 16, 2004, 11:39:26 AM, Justin Mason wrote:

...

a quick note on this; it has to be done very carefully. Many spammers are using "link poisoning" stuff like this:

...

  Get ov<A
  href="http://www.gimbel.org"></A>er 300 medicat<B><FONT
  size=3>l</FONT></B>ons online sh<B><FONT size=3>l</FONT></B>pp<A
  href="http://www.omniscient.com"></A>ed over<A
  href="http://www.proton.net"></A>nig<A
  href="http://www.cravet.org"></A>ht to your fr<A
  href="http://www.aristotelean.org"></A>ont do<A
  href="http://www.barnacle.com"></A>or with no pr<A
  href="http://www.lordosis.net"></A>escr<B><FONT
  size=3>l</FONT></B>ption.</FONT>

On Wednesday, June 16, 2004, 11:45:03 AM, Raymond Dijkxhoorn wrote: (Justin wrote:)

...

...
All of those are "www.{RANDOMWORD}.{com|net|org}". Eventually there's one real link, which *is* SURBL-listed. These are chaff.

Now, SORBS for one seems to be listing some of these sites; presumably because they have a spamtrap-driven feed without enough human moderation. That's the danger here.

Yes, I agree poisoning could definitely be a problem. Thanks for the confirmation of that. I'm not going to rush into this or do anything without a lot of care. If a method is unsound, we won't pick it up.

...

...
(btw, there's arguments to be made that a better selection mechanism can "weed those out", but that needs to be careful too.

Ignore .org/.net/.com? spammer will use .biz, .info, and ccTLDs.

Ignore 0-length links (<a href=...></a>)? spammer will change

to use <a href=...>{RANDOMWORD}</a>.

Ignore "dictionary words" somehow? spammer will use random URLs

from google, so "real" sites.

so I don't think those approaches have much merit alone.)

Agreed.

I was going to propose taking the top Nth percentile of reports, hopefully from a large base of pots, but a large poisoner could break into that too.

Another approach, which Outblaze apparently applies to their domains to block on is to only list domains that have been registered within the last 90 days. The principle is that the newness is a good partial predictor of spammyness and that could have some value.

All of the above may not be enough to obtain good results automatically, mainly due to the poisoning problem you mention.

...

Its 'just' a extra source, ... on mu pot i found a couple domains that were indeed spammer domains but not listed yet. It involves some manual action but i think its nice additions.

Hand-checking could make it feasible.

Jeff C.

jm＠jmason.org

9:35 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Jeff Chan writes:

...

I was going to propose taking the top Nth percentile of reports, hopefully from a large base of pots, but a large poisoner could break into that too.

hmm. probably helpful, as long as they generate random URLs for every mail.

...

Another approach, which Outblaze apparently applies to their domains to block on is to only list domains that have been registered within the last 90 days. The principle is that the newness is a good partial predictor of spammyness and that could have some value.

yes, that would probably help...

...

All of the above may not be enough to obtain good results automatically, mainly due to the poisoning problem you mention.

...
Its 'just' a extra source, ... on mu pot i found a couple domains that were indeed spammer domains but not listed yet. It involves some manual action but i think its nice additions.

Hand-checking could make it feasible.

definitely, that's the key. Even checking the URLs (dump the text with "lynx -dump" for example) would probably help.

- --j.

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFA0KDkQTcbUG5Y7woRAoXxAKCkkmKNnHXHkKNkhwhO42LHlAfU/QCgpv3K EAdanarGX05b93AbFKnxkkw= =+FcL -----END PGP SIGNATURE-----

Robert Brooks

17 Jun 17 Jun

12:19 p.m.

...

...
Hand-checking could make it feasible.

definitely, that's the key. Even checking the URLs (dump the text with "lynx -dump" for example) would probably help.

maybe if the resulting html could be scored, perhaps SA could be adapted to such purpose, you could for example score based on the webserver ip appearing in dns blacklists etc. You would have to be careful about your user-agent to avoid spammers presenting clean content to the checker.

-- Robert Brooks, Network Manager, Cable & Wireless UK robb@hyperlink-interactive.co.uk http://hyperlink-interactive.co.uk/ Tel: +44 (0)20 7339 8600 Fax: +44 (0)20 7339 8601 - Help Microsoft stamp out piracy. Give Linux to a friend today! -

Jose Marcio Martins da Cruz

16 Jun 16 Jun

10:45 p.m.

...

On Wednesday, June 16, 2004, 11:39:26 AM, Justin Mason wrote:

...
a quick note on this; it has to be done very carefully. Many spammers are using "link poisoning" stuff like this:

...
 Get ov<A
 href="http://www.gimbel.org"></A>er 300 medicatlons online shlpp<A
 href="http://www.omniscient.com"></A>ed over<A
 href="http://www.proton.net"></A>nig<A
...
...
(btw, there's arguments to be made that a better selection mechanism can "weed those out", but that needs to be careful too.

Ignore .org/.net/.com? spammer will use .biz, .info, and ccTLDs.

Ignore 0-length links (<a href=...></a>)? spammer will change

to use <a href=...>{RANDOMWORD}</a>.

No ! O-length links are invisible. RANDOMWORDs or anything with length greater than 0 are visible !

...

...
...

Ignore "dictionary words" somehow? spammer will use random URLs

from google, so "real" sites.

so I don't think those approaches have much merit alone.)

I think I sent you a little output of my scripts which help me to manual validate URLs. It's enough to list all URLs in the spam, the number of times they appear, and you'll quickly what shall be blacklisted.

...

Hand-checking could make it feasible.

Yes. The better idea, IMO, is to find the better way to present URLs with some hints and manually validate them to add to the blacklist. This is how I do.

Best

Joe

...

Jeff C.

Discuss mailing list Discuss@lists.surbl.org http://lists.surbl.org/mailman/listinfo/discuss

Patrik Nilsson

17 Jun 17 Jun

12:15 a.m.

At 22:45 2004-06-16 +0200, Jose Marcio Martins da Cruz wrote:

...

...
On Wednesday, June 16, 2004, 11:39:26 AM, Justin Mason wrote:

...
...

Ignore .org/.net/.com? spammer will use .biz, .info, and ccTLDs.

Ignore 0-length links (<a href=...></a>)? spammer will change

to use <a href=...>{RANDOMWORD}</a>.

No ! O-length links are invisible. RANDOMWORDs or anything with length greater than 0 are visible !

The spammers have already started using one-char "words" as a way to get around checks for 0-length links and still keep the links relatively non-visible.

I wouldn't be surprised if some of them start using style sheets to hide certain links in html-mail next, so I wouldn't count on them not using something like <a href=... class="my_bgcolored_A">{RANDOMWORD}</a> in the future...

Patrik

7661

Age (days ago)

7662

Last active (days ago)

discuss@lists.surbl.org

6 comments

6 participants

tags (0)

participants (6)

Jeff Chan
jm＠jmason.org
Jose Marcio Martins da Cruz
Patrik Nilsson
Raymond Dijkxhoorn
Robert Brooks