From jm@jmason.org Wed Jun 16 20:39:51 2004
From: jm@jmason.org
To: discuss@lists.surbl.org
Subject: [SURBL-Discuss] proxypots
Date: Wed, 16 Jun 2004 11:39:26 -0700
Message-ID: <20040616183927.4CECE590006@radish.jmason.org>
In-Reply-To: <1068018218.20040616015831@supranet.net>
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="===============4512097550032427487=="
--===============4512097550032427487==
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Jeff Chan writes:
>We are also looking into some other potential spam
>URI data sources such as proxypots, etc.:
>
> http://proxypot.org/
Jeff --
a quick note on this; it has to be done very carefully. Many spammers are
using "link poisoning" stuff like this:
Get over 300 medicatlons online shlpped overnight to your front door with no prescrlption.
All of those are "www.{RANDOMWORD}.{com|net|org}". Eventually there's
one real link, which *is* SURBL-listed. These are chaff.
Now, SORBS for one seems to be listing some of these sites; presumably
because they have a spamtrap-driven feed without enough human moderation.
That's the danger here.
(btw, there's arguments to be made that a better selection mechanism
can "weed those out", but that needs to be careful too.
- - Ignore .org/.net/.com? spammer will use .biz, .info, and ccTLDs.
- - Ignore 0-length links ()? spammer will change
to use {RANDOMWORD}.
- - Ignore "dictionary words" somehow? spammer will use random URLs
from google, so "real" sites.
so I don't think those approaches have much merit alone.)
- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Exmh CVS
iD8DBQFA0JPeQTcbUG5Y7woRAnYYAJ9/fZaT3WLmU+gT8aAnT2rcduDo7QCg6BE1
dF1r9ciWtFpEdC4OBHdRSKE=
=mnKX
-----END PGP SIGNATURE-----
--===============4512097550032427487==--
From raymond@prolocation.net Wed Jun 16 20:45:04 2004
From: Raymond Dijkxhoorn
To: discuss@lists.surbl.org
Subject: Re: [SURBL-Discuss] proxypots
Date: Wed, 16 Jun 2004 20:45:03 +0200
Message-ID:
In-Reply-To: <20040616183927.4CECE590006@radish.jmason.org>
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="===============0830626804604201994=="
--===============0830626804604201994==
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
Hi Justin,
> All of those are "www.{RANDOMWORD}.{com|net|org}". Eventually there's
> one real link, which *is* SURBL-listed. These are chaff.
>
> Now, SORBS for one seems to be listing some of these sites; presumably
> because they have a spamtrap-driven feed without enough human moderation.
> That's the danger here.
>
>
> (btw, there's arguments to be made that a better selection mechanism
> can "weed those out", but that needs to be careful too.
>
> - - Ignore .org/.net/.com? spammer will use .biz, .info, and ccTLDs.
> - - Ignore 0-length links ()? spammer will change
> to use {RANDOMWORD}.
> - - Ignore "dictionary words" somehow? spammer will use random URLs
> from google, so "real" sites.
>
> so I don't think those approaches have much merit alone.)
Its 'just' a extra source, ... on mu pot i found a couple domains that
were indeed spammer domains but not listed yet. It involves some manual
action but i think its nice additions.
Bye,
Raymond.
--===============0830626804604201994==--
From jeffc@surbl.org Wed Jun 16 21:16:36 2004
From: Jeff Chan
To: discuss@lists.surbl.org
Subject: Re: [SURBL-Discuss] proxypots
Date: Wed, 16 Jun 2004 12:16:25 -0700
Message-ID: <728573649.20040616121625@supranet.net>
In-Reply-To:
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="===============1280766296776530618=="
--===============1280766296776530618==
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
On Wednesday, June 16, 2004, 11:39:26 AM, Justin Mason wrote:
> a quick note on this; it has to be done very carefully. Many spammers are
> using "link poisoning" stuff like this:
> Get ov href="http://www.gimbel.org">er 300 medicat size=3>lons online shlpp href="http://www.omniscient.com">ed over href="http://www.proton.net">nig href="http://www.cravet.org">ht to your fr href="http://www.aristotelean.org">ont do href="http://www.barnacle.com">or with no pr href="http://www.lordosis.net">escr size=3>lption.
On Wednesday, June 16, 2004, 11:45:03 AM, Raymond Dijkxhoorn wrote:
(Justin wrote:)
>> All of those are "www.{RANDOMWORD}.{com|net|org}". Eventually there's
>> one real link, which *is* SURBL-listed. These are chaff.
>>
>> Now, SORBS for one seems to be listing some of these sites; presumably
>> because they have a spamtrap-driven feed without enough human moderation.
>> That's the danger here.
Yes, I agree poisoning could definitely be a problem. Thanks
for the confirmation of that. I'm not going to rush into this or
do anything without a lot of care. If a method is unsound, we
won't pick it up.
>> (btw, there's arguments to be made that a better selection mechanism
>> can "weed those out", but that needs to be careful too.
>>
>> - - Ignore .org/.net/.com? spammer will use .biz, .info, and ccTLDs.
>> - - Ignore 0-length links ()? spammer will change
>> to use {RANDOMWORD}.
>> - - Ignore "dictionary words" somehow? spammer will use random URLs
>> from google, so "real" sites.
>>
>> so I don't think those approaches have much merit alone.)
Agreed.
I was going to propose taking the top Nth percentile of reports,
hopefully from a large base of pots, but a large poisoner could
break into that too.
Another approach, which Outblaze apparently applies to their
domains to block on is to only list domains that have been
registered within the last 90 days. The principle is that the
newness is a good partial predictor of spammyness and that
could have some value.
All of the above may not be enough to obtain good results
automatically, mainly due to the poisoning problem you
mention.
> Its 'just' a extra source, ... on mu pot i found a couple domains that
> were indeed spammer domains but not listed yet. It involves some manual
> action but i think its nice additions.
Hand-checking could make it feasible.
Jeff C.
--===============1280766296776530618==--
From jm@jmason.org Wed Jun 16 21:35:30 2004
From: jm@jmason.org
To: discuss@lists.surbl.org
Subject: Re: [SURBL-Discuss] proxypots
Date: Wed, 16 Jun 2004 12:35:00 -0700
Message-ID: <20040616193501.DEC51590006@radish.jmason.org>
In-Reply-To: <728573649.20040616121625@supranet.net>
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="===============2522688583547294253=="
--===============2522688583547294253==
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Jeff Chan writes:
> I was going to propose taking the top Nth percentile of reports,
> hopefully from a large base of pots, but a large poisoner could
> break into that too.
hmm. probably helpful, as long as they generate random URLs for
every mail.
> Another approach, which Outblaze apparently applies to their
> domains to block on is to only list domains that have been
> registered within the last 90 days. The principle is that the
> newness is a good partial predictor of spammyness and that
> could have some value.
yes, that would probably help...
> All of the above may not be enough to obtain good results
> automatically, mainly due to the poisoning problem you
> mention.
>
> > Its 'just' a extra source, ... on mu pot i found a couple domains that
> > were indeed spammer domains but not listed yet. It involves some manual
> > action but i think its nice additions.
>
> Hand-checking could make it feasible.
definitely, that's the key. Even checking the URLs (dump the text
with "lynx -dump" for example) would probably help.
- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Exmh CVS
iD8DBQFA0KDkQTcbUG5Y7woRAoXxAKCkkmKNnHXHkKNkhwhO42LHlAfU/QCgpv3K
EAdanarGX05b93AbFKnxkkw=
=+FcL
-----END PGP SIGNATURE-----
--===============2522688583547294253==--
From martins@ensmp.fr Wed Jun 16 22:45:41 2004
From: Jose Marcio Martins da Cruz
To: discuss@lists.surbl.org
Subject: Re: [SURBL-Discuss] proxypots
Date: Wed, 16 Jun 2004 22:45:33 +0200
Message-ID: <200406162045.i5GKjX3L025328@ensmp.fr>
In-Reply-To: <728573649.20040616121625@supranet.net>
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="===============6217100308274493785=="
--===============6217100308274493785==
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
>
> On Wednesday, June 16, 2004, 11:39:26 AM, Justin Mason wrote:
> > a quick note on this; it has to be done very carefully. Many spammers are
> > using "link poisoning" stuff like this:
>
> > Get ov > href="http://www.gimbel.org">er 300 medicat > size=3>lons online shlpp > href="http://www.omniscient.com">ed over > href="http://www.proton.net">nig
> >> (btw, there's arguments to be made that a better selection mechanism
> >> can "weed those out", but that needs to be careful too.
> >>
> >> - - Ignore .org/.net/.com? spammer will use .biz, .info, and ccTLDs.
> >> - - Ignore 0-length links ()? spammer will change
> >> to use {RANDOMWORD}.
No ! O-length links are invisible. RANDOMWORDs or anything with length
greater than 0 are visible !
> >> - - Ignore "dictionary words" somehow? spammer will use random URLs
> >> from google, so "real" sites.
> >>
> >> so I don't think those approaches have much merit alone.)
I think I sent you a little output of my scripts which help me
to manual validate URLs. It's enough to list all URLs in the
spam, the number of times they appear, and you'll quickly what
shall be blacklisted.
> Hand-checking could make it feasible.
Yes. The better idea, IMO, is to find the better way to
present URLs with some hints and manually validate them
to add to the blacklist. This is how I do.
Best
Joe
>
> Jeff C.
>
> _______________________________________________
> Discuss mailing list
> Discuss(a)lists.surbl.org
> http://lists.surbl.org/mailman/listinfo/discuss
>
>
--===============6217100308274493785==--
From patrik@patrik.com Thu Jun 17 00:16:13 2004
From: Patrik Nilsson
To: discuss@lists.surbl.org
Subject: Re: [SURBL-Discuss] proxypots
Date: Thu, 17 Jun 2004 00:15:58 +0200
Message-ID: <5.2.0.5.0.20040617000730.0437ffe8@ulithi.infotropic.com>
In-Reply-To: <200406162045.i5GKjX3L025328@ensmp.fr>
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="===============4932937412896791138=="
--===============4932937412896791138==
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
At 22:45 2004-06-16 +0200, Jose Marcio Martins da Cruz wrote:
> > On Wednesday, June 16, 2004, 11:39:26 AM, Justin Mason wrote:
> > >> - - Ignore .org/.net/.com? spammer will use .biz, .info, and ccTLDs.
> > >> - - Ignore 0-length links ()? spammer will change
> > >> to use {RANDOMWORD}.
>
>No ! O-length links are invisible. RANDOMWORDs or anything with length
>greater than 0 are visible !
The spammers have already started using one-char "words" as a way to get
around checks for 0-length links and still keep the links relatively
non-visible.
I wouldn't be surprised if some of them start using style sheets to hide
certain links in html-mail next, so I wouldn't count on them not using
something like {RANDOMWORD} in the
future...
Patrik
--===============4932937412896791138==--
From robb@hyperlink-interactive.co.uk Thu Jun 17 12:19:21 2004
From: Robert Brooks
To: discuss@lists.surbl.org
Subject: Re: [SURBL-Discuss] proxypots
Date: Thu, 17 Jun 2004 11:19:10 +0100
Message-ID: <40D1701E.8080707@hyperlink-interactive.co.uk>
In-Reply-To: <20040616193501.DEC51590006@radish.jmason.org>
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="===============6534672092131804672=="
--===============6534672092131804672==
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
>>Hand-checking could make it feasible.
>=20
>=20
> definitely, that's the key. Even checking the URLs (dump the text
> with "lynx -dump" for example) would probably help.
maybe if the resulting html could be scored, perhaps SA could be adapted to s=
uch=20
purpose, you could for example score based on the webserver ip appearing in d=
ns=20
blacklists etc. You would have to be careful about your user-agent to avoid =
spammers presenting clean content to the checker.
--=20
Robert Brooks, Network Manager, Cable & Wireless UK
http://hyperlink-interactive.co.uk/
Tel: +44 (0)20 7339 8600 Fax: +44 (0)20 7339 8601
- Help Microsoft stamp out piracy. Give Linux to a friend today! -
--===============6534672092131804672==--