Auto adding detected "static 'front page' sites".

List overview All Threads
Download

newer

older

Re: SURBL and whitelist

RE: [SURBL-Discuss] Auto adding...

Christiaan den Besten

23 Aug 2004 23 Aug '04

12:16 a.m.

After working through today's spam I was amused we (almost) only received spam for 3 site's:

The following (some old) url's to these sites are still alive: - : sublunary1132nx.com - : ourpillsdirect.com - : naturalwellnessessence.com

Naturally all domains were added to the prolocation rbl and the (low)spam decreased within the hour :). But I was wondering .. a small test with a simple 'diff' script showed me that comparing output from url's found in 'fresh' spam with known spam-sites is doable. These guys seem to be changing domains every 8 hours or so...

Would it be bad to have some of these (stupid) 'static webpage hosting' spammers automaticly being added to the WS list by comparing the output of their home page advertised in the url?

It's fairly easy to create a script to do this ... that's not the issue.... what could go wrong ?.. any input would be appreciated :)

Secondly, while doing these tests I noticed that a lot of the sites listed in (our) WS-list are not longer 'alive'. Is there any clean-up procedure defined yet ?... or will the list just keep on growing ;)

bye, Chris

Show replies by date

Jeff Chan

23 Aug 23 Aug

12:39 a.m.

On Sunday, August 22, 2004, 3:16:42 PM, Christiaan Besten wrote:

...

a small test with a simple 'diff' script showed me that comparing output from url's found in 'fresh' spam with known spam-sites is doable. These guys seem to be changing domains every 8 hours or so...

...

Would it be bad to have some of these (stupid) 'static webpage hosting' spammers automaticly being added to the WS list by comparing the output of their home page advertised in the url?

...

It's fairly easy to create a script to do this ... that's not the issue.... what could go wrong ?.. any input would be appreciated :)

(Welcome to the list Chris! Same to other new folks!)

Could be interesting, but how do you discover the new domain names?

...

Secondly, while doing these tests I noticed that a lot of the sites listed in (our) WS-list are not longer 'alive'. Is there any clean-up procedure defined yet ?... or will the list just keep on growing ;)

Yes there is a clean up that Bill Stearns has done to get rid of older domains. Presumably we'll do that occasionally to prune the list of old, non-functional domains.

Jeff C.

Christiaan den Besten

1:05 a.m.

...

Could be interesting, but how do you discover the new domain names?

...

From fresh spam received at the spam-trap(s) ... from my point of view: the

sooner we add them to WS the better ... These site are realy 'common' spammers (ehm, referrers officially), but are very easily detected/confirmed in an automated way...

bye, Chris

Jeff Chan

1:16 a.m.

On Sunday, August 22, 2004, 4:05:11 PM, Christiaan Besten wrote:

...

...
Could be interesting, but how do you discover the new domain names?

...

...
From fresh spam received at the spam-trap(s) ... from my point of view: the

sooner we add them to WS the better ... These site are realy 'common' spammers (ehm, referrers officially), but are very easily detected/confirmed in an automated way...

If you're pretty confident your scripts can catch the spam domains with few false positive, I'd say go for it.

Your methodology in general sounds good to me:

1. Spam domain appears in traps. 2. Web page is static and appears in other spams. 3. Add to WS.

Perhaps you could publish your scripts for review, if you'd like that. :-)

Comments welcomed and encouraged.

Jeff C.

Christiaan den Besten

1:54 a.m.

...

If you're pretty confident your scripts can catch the spam domains with few false positive, I'd say go for it.

It should only 'detect' sites already available to (our) WS list ... I think that will hardly ever give FP's. In the beginning we could let the script only check (manually added) known spam-sites, this should prevent FP's even more...

...

Your methodology in general sounds good to me:

Spam domain appears in traps.

Web page is static and appears in other spams.

Add to WS.

Perhaps you could publish your scripts for review, if you'd like that. :-)

Sure.... I would first like to see if anyone could think of a flaw in this method. If no one can think of anything I could try to code this the next couple of days. Will publish the result as soon as its available.

I think the most complicated part is the 'filtering usable (non hidden) url's out of received spam' part. I was thinking of reusing code designed by the SA crew. Has anyone tried that before ?

bye, Chris

Jeff Chan

2:55 a.m.

On Sunday, August 22, 2004, 4:54:36 PM, Christiaan Besten wrote:

...

I think the most complicated part is the 'filtering usable (non hidden) url's out of received spam' part. I was thinking of reusing code designed by the SA crew. Has anyone tried that before ?

Have not tried it, but agree it's a good approach. Message and URI parsing from spams can be non-trivial.

Jeff C.

jm＠jmason.org

4:09 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Jeff Chan writes:

...

On Sunday, August 22, 2004, 4:54:36 PM, Christiaan Besten wrote:

...
I think the most complicated part is the 'filtering usable (non hidden) url's out of received spam' part. I was thinking of reusing code designed by the SA crew. Has anyone tried that before ?

Have not tried it, but agree it's a good approach. Message and URI parsing from spams can be non-trivial.

I'd recommend:

1. a *really* simple SpamAssassin 3.0.0 plugin be written, that just dumps $scanner->get_uri_list() to STDOUT. (this is *really* easy. honest)

2. create a config file that loads that plugin and sets up a fake "rule" that runs it.

3. Then when you want to grab URLs from a spam mail, run "spamassassin -c configfile -L -t < msg" on it; to process a bigger batch of spam, use "mass-check -c configfile".

4. Profit!

if someone does this, please let me know how they find the doco, etc., and put a page up on the SpamAssassin wiki about it... I'm trying to encourage more plugins ;)

- --j.

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBKVG+QTcbUG5Y7woRAoMrAJwI+A+1CwIREUfcIuGxrPcFGeV1kwCg7Zbs UzgkHRN4GSkDCk6GDsqqqFo= =Klla -----END PGP SIGNATURE-----

Christiaan den Besten

12:02 p.m.

Justin,

...

a *really* simple SpamAssassin 3.0.0 plugin be written,

that just dumps $scanner->get_uri_list() to STDOUT. (this is *really* easy. honest)

Let me try... just printed the MyPlugin example, let's see how easy it is :)

...

create a config file that loads that plugin and sets up a

fake "rule" that runs it.

...

Then when you want to grab URLs from a spam mail, run

"spamassassin -c configfile -L -t < msg" on it; to process a bigger batch of spam, use "mass-check -c configfile".

Profit!

Sound really nice .. eh eh

...

if someone does this, please let me know how they find the doco, etc., and put a page up on the SpamAssassin wiki about it... I'm trying to encourage more plugins ;)

I have just created something that does the trick, but not the fun part ... How do I tell SpamAssassin _not_ to run all the other 'default' rules it find in the default rules path ?

--- [root@fedora /home/chris/surbldumpuri]# cat sa.conf skip_rbl_checks 1

use_bayes 0 use_pyzor 0 use_dcc 0

[root@fedora /home/chris/surbldumpuri]# spamassassin -D -L -t --siteconfigpath=. -p sa.conf < testmsg > /dev/null ---

Above current (very simple test) setup, though -L should prevent the use_pyzor etc checks anyway...

Remark/question: - perldoc PerMsgStatus does not contain "get_uri_list" :) - can I use method's from another plugin (uridnsbl) to test if uri's are already on the rbl?

bye, Chris

jm＠jmason.org

8:36 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

...

I have just created something that does the trick, but not the fun part ... How do I tell SpamAssassin _not_ to run all the other 'default' rules it find in the default rules path ?

use "-c" -- that overrides the "system" rules path, so it won't read the system rules at all.

...

perldoc PerMsgStatus does not contain "get_uri_list" :)

oh, good point. could you open a bug about it? (these are all new "public" APIs since the addition of plugins; previously they were more private.)

...

can I use method's from another plugin (uridnsbl) to test if uri's

are already on the rbl?

hmm-- that could be quite tricky! possibly, but it would be hard. You would have to run the URIBL tests in entirety, then look up the results on that object.

btw, there's a bug in 3.0.0 that means that plugins can't be loaded from a file that isn't in the @INC search path. should be fixed by full release, though.

- --j.

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBKjkkQTcbUG5Y7woRAh6cAKCRBvv/z5k74iCTyzm1FF6gArZauQCg29M3 7iuk6L/PJgmAxEjAAvKnRHA= =V440 -----END PGP SIGNATURE-----

Ryan Thompson

4:14 a.m.

Jeff Chan wrote to discuss@lists.surbl.org:

...

On Sunday, August 22, 2004, 4:54:36 PM, Christiaan Besten wrote:

...
I think the most complicated part is the 'filtering usable (non hidden) url's out of received spam' part. I was thinking of reusing code designed by the SA crew. Has anyone tried that before ?

Have not tried it, but agree it's a good approach. Message and URI parsing from spams can be non-trivial.

I posted a message here on Aug 16th with some results and comments on the approach I've used, part of which is exactly as Christiaan described. I didn't get any replies, so I didn't bother following up. Have a look at my post, and I'd be happy to share any results.

- Ryan

-- Ryan Thompson ryan@sasknow.com SaskNow Technologies - http://www.sasknow.com 901-1st Avenue North - Saskatoon, SK - S7K 1Y4 Tel: 306-664-3600 Fax: 306-244-7037 Saskatoon Toll-Free: 877-727-5669 (877-SASKNOW) North America

Christiaan den Besten

10:22 a.m.

Hi Ryan !

That thread consumed like 100 subjects :) ... had not seen the link to your example site. Looks impressive :) So, in the end, what did you use for message (uri) parsing ?

bye, Chris

...

-----Original Message----- From: discuss-bounces@lists.surbl.org [mailto:discuss-bounces@lists.surbl.org] On Behalf Of Ryan Thompson Sent: maandag 23 augustus 2004 4:14 To: SURBL Discussion list Subject: Re: [SURBL-Discuss] Auto adding detected "static 'front page' sites".

Jeff Chan wrote to discuss@lists.surbl.org:

...
On Sunday, August 22, 2004, 4:54:36 PM, Christiaan Besten wrote:

...
I think the most complicated part is the 'filtering usable (non hidden) url's out of received spam' part. I was thinking of reusing code designed by the SA crew. Has anyone tried that before ?

Have not tried it, but agree it's a good approach. Message and URI parsing from spams can be non-trivial.

I posted a message here on Aug 16th with some results and comments on the approach I've used, part of which is exactly as Christiaan described. I didn't get any replies, so I didn't bother following up. Have a look at my post, and I'd be happy to share any results.

Ryan

-- Ryan Thompson ryan@sasknow.com

SaskNow Technologies - http://www.sasknow.com 901-1st Avenue North - Saskatoon, SK - S7K 1Y4
     Tel: 306-664-3600   Fax: 306-244-7037   Saskatoon
Toll-Free: 877-727-5669 (877-SASKNOW) North America _______________________________________________ Discuss mailing list Discuss@lists.surbl.org http://lists.surbl.org/mailman/listinfo/discuss

Ryan Thompson

11:01 a.m.

Christiaan den Besten wrote to 'SURBL Discussion list':

...

...
I posted a message here on Aug 16th with some results and comments on the approach I've used, part of which is exactly as Christiaan described. I didn't get any replies, so I didn't bother following up. Have a look at my post, and I'd be happy to share any results.

Hi Ryan !

That thread consumed like 100 subjects :) ...

Touché. :-)

...

had not seen the link to your example site. Looks impressive :) So, in the end, what did you use for message (uri) parsing ?

Mail::SpamAssassin::PerMsgStatus::get_uri_list($status), but there were a few other incantations that I did to get the list of URIs down. I have been meaning to publish the script, but things keep getting in the way. I will do that tomorrow (today). Stay tuned!

- Ryan

William Stearns

3:33 a.m.

Good evening, Christiaan,

On Mon, 23 Aug 2004, Christiaan den Besten wrote:

...

Secondly, while doing these tests I noticed that a lot of the sites listed in (our) WS-list are not longer 'alive'. Is there any clean-up procedure

For domains that have completely fallen out of the whois database, we remove those automatically, although this takes a while. The "dead" domains (domains no longer in whois) exist in the "withdead" files under http://www.stearns.org/sa-blacklist/ , but are removed from all files without "withdead" in their names. I can't remove any domains that are still registered (in the whois database) because if I did, a spammer could temporarily shut off the "A" records, wait until they're out the blacklist, and then trivially turn the A records back on.

...

defined yet ?... or will the list just keep on growing ;)

37,000 domains and growing. :-) Cheers, - Bill

--------------------------------------------------------------------------- "I disapprove of what you say, but I will defend your right to say it with all my might." (Courtesy of Patrick Mauritz oxygene@studentenbude.ath.cx) -------------------------------------------------------------------------- William Stearns (wstearns@pobox.com). Mason, Buildkernel, freedups, p0f, rsync-backup, ssh-keyinstall, dns-check, more at: http://www.stearns.org --------------------------------------------------------------------------

7726

Age (days ago)

7727

Last active (days ago)

discuss@lists.surbl.org

12 comments

5 participants

tags (0)

participants (5)

Christiaan den Besten
Jeff Chan
jm＠jmason.org
Ryan Thompson
William Stearns