-----Original Message----- From: Jeff Chan [mailto:jeffc@surbl.org] Sent: Thursday, September 02, 2004 3:24 AM To: SATalk Cc: SURBL Discuss Subject: Re: Applying SURBL against blog comment spammers
On Wednesday, September 1, 2004, 11:25:40 PM, Matthew Hunter wrote:
I just whipped up some code to reject trackback/comment spam using a SURBL as a data source. Unfortunately, the people spamming my weblogs aren't in multi.surbl.org, so I will have to maintain my own local blacklist server.
The single most useful thing that could be done wrt fighting spam in weblogs would be an SURBL source that had the offending domains in it. I would offer to make mine public, but I don't have the IP to spare at the moment...
Does anyone know of an appropriate SURBL list?
Hi Matthew, We could perhaps set up a separate SURBL for blog spammers. It would be a slight shift in focus since the other SURBLs are all for email spam. Can you give an idea of how many records you have?
Also have you tried Jay Allen's MT-Blacklist/Comment Spam list:
http://www.jayallen.org/comment_spam/
It would be interesting to look at your data to see if there's much overlap with our existing lists. In the case of Jay's data, there's nearly none.
Hell I'm feeling a little saucy this morning so lets mull this over. This goes against Jeff's thoughts. But if they are spamming, then just add them to SURBL. Does it matter if they spam email or blogs? To me, not really. Adding them to the regular SURBL is sure to cause them some pain.
Legit domains still get removed.
SO I say, go ahead and add them. However I would like to see an example of a spam'd blog. I've never seen one.
--Chris
SO I say, go ahead and add them.
Chris... where have you been? We have had extensive discussions recently where we all concluded that we have to make getting the FPs down a priority and the only way to do this is to (gulp), allow some graymarketers loose (those who do a some or much spamming but who do have some legitimate uses). It wasn't an easy decision... but the need to get SURBL to a "set it and forget it" point is mission critical.
However I would like to see an example of a spam'd blog.
Happens all the time in the comments section of these blogs. Not often with blogger.com, more often with Moveable Type blogs because (1) these link to the comments section, (2) the comments section is usually permanent with MT, and (3) these comments then get indexed by search engines.
It is this last thing that is key. These blog spammers don't seriously think that they are going to sell much based on human beings viewing their spam. They just want search engine position because Google sees a link from such-in-such blog to the spammers web site as a "vote" by such-in-such blog for that spammers site, thus raising the spammer's site's rankings on Google.
Mischievous... but a very powerful marketing technique!
For examples, see the following:
http://www.google.com/search?hl=en&ie=UTF-8&q=%22diet+pills%22+%22mo... ype%22
Rob McEwen
On Thursday, September 2, 2004, 6:53:41 AM, Rob McEwen wrote:
SO I say, go ahead and add them.
Chris... where have you been? We have had extensive discussions recently where we all concluded that we have to make getting the FPs down a priority and the only way to do this is to (gulp), allow some graymarketers loose (those who do a some or much spamming but who do have some legitimate uses). It wasn't an easy decision... but the need to get SURBL to a "set it and forget it" point is mission critical.
I agree about reducing FPs, but I'm not sure if adding blog spammers would increase or decrease that. I guess I'm not familiar with them.
Are the blog spammers scammers and criminals like professional mail spammers? Do they advertise legitimate sites, or the usual pill and mortgage sites in Korea, China and Brazil?
Jeff C.
Hi!
I agree about reducing FPs, but I'm not sure if adding blog spammers would increase or decrease that. I guess I'm not familiar with them.
Are the blog spammers scammers and criminals like professional mail spammers? Do they advertise legitimate sites, or the usual pill and mortgage sites in Korea, China and Brazil?
Yes they are. Last night i had a meeting with the author of Pivot, a well known blogtool, and he could also point us towards some i think...
He has some features to remove postings from those guys, so i guess he also can compile lists of those...
Bye, Raymond
NOTE: I sent this again because it didn't seem to get e-mailed (for whatever reason, sorry if you get it twice.)
Given the lack of commonality, it may not make much sense to add to the mail spam lists, since it would be an extra 2000+ records that would probably not get hits on mail.
The MT-Blacklist doesn't seem to update too frequently (the last new record was from 8/29) and has about 2000 records. Matthew's list was pretty sparse so far. So I'm still pondering things.
Jeff,
I could be totally wrong... but I suspect that the lack of commonality may be more due to either the MT-blacklist not being updated as frequently or because the MT-blacklist may not be updated as extensively.
A good test may be to "Google" some more frequently found SURBL-blocked domains which are not on the MT-blacklist along with the phrase "moveable type" ...you might find a lot of blog-comment spam which should have been on the MT-blacklist and is already in SURBL.
Moreover, blog spam is a huge issue and this presents a great opportunity for SURBL to make a big splash, all for a good purpose. If you get the "big-time" bloggers to write SURBL, you will soon find SURBL on the front page of cnn or foxnews, etc.
Rob McEwen
On Thursday, September 2, 2004, 7:28:16 AM, Rob McEwen wrote: (Jeff Chan wrote:)
Given the lack of commonality, it may not make much sense to add to the mail spam lists, since it would be an extra 2000+ records that would probably not get hits on mail.
The MT-Blacklist doesn't seem to update too frequently (the last new record was from 8/29) and has about 2000 records. Matthew's list was pretty sparse so far. So I'm still pondering things.
I could be totally wrong... but I suspect that the lack of commonality may be more due to either the MT-blacklist not being updated as frequently or because the MT-blacklist may not be updated as extensively.
A good test may be to "Google" some more frequently found SURBL-blocked domains which are not on the MT-blacklist along with the phrase "moveable type" ...you might find a lot of blog-comment spam which should have been on the MT-blacklist and is already in SURBL.
Hmm, interesting. Other comments also seem to suggest that the blog spammers are sometimes the same as mail spammers. So maybe there should be more overlap, but additions to MT-blacklist are slower than SURBLs.
On the other hand, our databases are pretty far reaching and should have hit on even older blog spam domains, yet they largely didn't.
The quick and easy answer, which may be wrong, is that they're different folks, or at least different domains.
Jeff C.
On Thursday, September 2, 2004, 7:08:47 AM, Raymond Dijkxhoorn wrote:
Are the blog spammers scammers and criminals like professional mail spammers? Do they advertise legitimate sites, or the usual pill and mortgage sites in Korea, China and Brazil?
Yes they are.
Are they advertising legitimate sites or bad guy sites?
Jeff C.
Jeff Chan wrote:
Are they advertising legitimate sites or bad guy sites?
Gambling sites, "pillz" sites, etc. The usual.
More insidious are the ones that link to legit blogs that have already been spammed, as described here: http://photomatt.net/2004/08/01/weeds-in-the-garden/
On Thursday, September 2, 2004, 9:24:19 AM, Kelson Kelson wrote:
Jeff Chan wrote:
Are they advertising legitimate sites or bad guy sites?
Gambling sites, "pillz" sites, etc. The usual.
Thanks. Sounds like there are some definite bad guys spamming blogs.
More insidious are the ones that link to legit blogs that have already been spammed, as described here: http://photomatt.net/2004/08/01/weeds-in-the-garden/
From that interview:
Alex told me the other day about a new type of comment spam hes been seeing: comments that link to normal blog entries. Well known blogs like Mozillazine. As advanced as tools like MT Blacklist have become, theyre pretty useless in cases like this. Are you going to blacklist Dave Sifry
Hmm, we definitely don't want to block legitimate blog sites...
If we make a list of blog spams, how do we prevent that from happening? Whitelists of every legitimate blog site? That's probably nearly as hard to gather as whitelists of every legitimate web site. ;-)
Jeff C.
On Thursday, September 2, 2004, 6:36:29 AM, Chris Santerre wrote:
From: Jeff Chan [mailto:jeffc@surbl.org]
We could perhaps set up a separate SURBL for blog spammers. It would be a slight shift in focus since the other SURBLs are all for email spam. Can you give an idea of how many records you have?
Also have you tried Jay Allen's MT-Blacklist/Comment Spam list:
http://www.jayallen.org/comment_spam/
It would be interesting to look at your data to see if there's much overlap with our existing lists. In the case of Jay's data, there's nearly none.
Hell I'm feeling a little saucy this morning so lets mull this over. This goes against Jeff's thoughts. But if they are spamming, then just add them to SURBL. Does it matter if they spam email or blogs? To me, not really. Adding them to the regular SURBL is sure to cause them some pain.
Legit domains still get removed.
I'd probably lean towards a separate list if we set one up, since the data are of a different source type (web logs vs mail) and use. It would be a convenience for blog maintainers. It might be interesting to see how a blog spam list would do against mail spam, but judging by the lack of overlap, I would predict it not too relevant against mail.
Given the lack of commonality, it may not make much sense to add to the mail spam lists, since it would be an extra 2000+ records that would probably not get hits on mail.
The MT-Blacklist doesn't seem to update too frequently (the last new record was from 8/29) and has about 2000 records. Matthew's list was pretty sparse so far. So I'm still pondering things.
Comments welcome!
Jeff C.
Given the lack of commonality, it may not make much sense to add to the mail spam lists, since it would be an extra 2000+ records that would probably not get hits on mail.
The MT-Blacklist doesn't seem to update too frequently (the last new record was from 8/29) and has about 2000 records. Matthew's list was pretty sparse so far. So I'm still pondering things.
Jeff,
I could be totally wrong... but I suspect that the lack of commonality may be more due to either the MT-blacklist not being updated as frequently or because the MT-blacklist may not be updated as extensively.
A good test may be to "Google" some more frequently found SURBL-blocked domains which are not on the MT-blacklist along with the phrase "moveable type" ...you might find a lot of blog-comment spam which should have been on the MT-blacklist and is already in SURBL.
Moreover, blog spam is a huge issue and this presents a great opportunity for SURBL to make a big splash, all for a good purpose. If you get the "big-time" bloggers to write SURBL, you will soon find SURBL on the front page of cnn or foxnews, etc.
Rob McEwen
On Thu, Sep 02, 2004 at 09:36:29AM -0400, Chris Santerre csanterre@MerchantsOverseas.com wrote:
-----Original Message----- From: Jeff Chan [mailto:jeffc@surbl.org] Sent: Thursday, September 02, 2004 3:24 AM To: SATalk Cc: SURBL Discuss Subject: Re: Applying SURBL against blog comment spammers On Wednesday, September 1, 2004, 11:25:40 PM, Matthew Hunter wrote:
I just whipped up some code to reject trackback/comment spam using a SURBL as a data source. Unfortunately, the people spamming my weblogs aren't in multi.surbl.org, so I will have to maintain my own local blacklist server. The single most useful thing that could be done wrt fighting spam in weblogs would be an SURBL source that had the offending domains in it. I would offer to make mine public, but I don't have the IP to spare at the moment... Does anyone know of an appropriate SURBL list?
Hi Matthew, We could perhaps set up a separate SURBL for blog spammers. It would be a slight shift in focus since the other SURBLs are all for email spam. Can you give an idea of how many records you have? Also have you tried Jay Allen's MT-Blacklist/Comment Spam list: http://www.jayallen.org/comment_spam/ It would be interesting to look at your data to see if there's much overlap with our existing lists. In the case of Jay's data, there's nearly none.
Hell I'm feeling a little saucy this morning so lets mull this over. This goes against Jeff's thoughts. But if they are spamming, then just add them to SURBL. Does it matter if they spam email or blogs? To me, not really. Adding them to the regular SURBL is sure to cause them some pain.
Legit domains still get removed.
SO I say, go ahead and add them. However I would like to see an example of a spam'd blog. I've never seen one.
Here some some examples of trackback spam, which is perhaps best thought of as an automated hat-tip protocol. Let me know when you've seen them so I can delete them. These are new since sometime yesterday, I think (the last time I deleted this stuff). My SURBL update hasn't been posted to this site yet or it would have stopped these.
http://www.triggerfinger.org/weblog/servlet/trackback/164.jsp http://www.triggerfinger.org/weblog/servlet/trackback/449.jsp http://www.triggerfinger.org/weblog/servlet/trackback/2799.jsp http://www.triggerfinger.org/weblog/servlet/trackback/3947.jsp http://www.triggerfinger.org/weblog/servlet/trackback/5053.jsp http://www.triggerfinger.org/weblog/servlet/trackback/5324.jsp http://www.triggerfinger.org/weblog/servlet/trackback/5484.jsp http://www.triggerfinger.org/weblog/servlet/trackback/5519.jsp http://www.triggerfinger.org/weblog/servlet/trackback/5556.jsp
There's no standard comment API so I haven't fallen victim to that yet. Other bloggers have, but usually delete the comments ASAP... For comments, though, the simpler solution is probably to require an active user session (eg, session cookie accepted and returned from an earlier page). That can be programmatically done but it's harder. Parsing the comments for spam sign like email is, I think, inevitable in the long term. Well, that or requiring accounts to post comments.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Matthew Hunter writes:
On Thu, Sep 02, 2004 at 09:36:29AM -0400, Chris Santerre csanterre@MerchantsOverseas.com wrote:
-----Original Message----- From: Jeff Chan [mailto:jeffc@surbl.org] Sent: Thursday, September 02, 2004 3:24 AM To: SATalk Cc: SURBL Discuss Subject: Re: Applying SURBL against blog comment spammers On Wednesday, September 1, 2004, 11:25:40 PM, Matthew Hunter wrote:
I just whipped up some code to reject trackback/comment spam using a SURBL as a data source. Unfortunately, the people spamming my weblogs aren't in multi.surbl.org, so I will have to maintain my own local blacklist server. The single most useful thing that could be done wrt fighting spam in weblogs would be an SURBL source that had the offending domains in it. I would offer to make mine public, but I don't have the IP to spare at the moment... Does anyone know of an appropriate SURBL list?
Hi Matthew, We could perhaps set up a separate SURBL for blog spammers. It would be a slight shift in focus since the other SURBLs are all for email spam. Can you give an idea of how many records you have? Also have you tried Jay Allen's MT-Blacklist/Comment Spam list: http://www.jayallen.org/comment_spam/ It would be interesting to look at your data to see if there's much overlap with our existing lists. In the case of Jay's data, there's nearly none.
Hell I'm feeling a little saucy this morning so lets mull this over. This goes against Jeff's thoughts. But if they are spamming, then just add them to SURBL. Does it matter if they spam email or blogs? To me, not really. Adding them to the regular SURBL is sure to cause them some pain.
Legit domains still get removed.
SO I say, go ahead and add them. However I would like to see an example of a spam'd blog. I've never seen one.
Here some some examples of trackback spam, which is perhaps best thought of as an automated hat-tip protocol. Let me know when you've seen them so I can delete them. These are new since sometime yesterday, I think (the last time I deleted this stuff). My SURBL update hasn't been posted to this site yet or it would have stopped these.
http://www.triggerfinger.org/weblog/servlet/trackback/164.jsp http://www.triggerfinger.org/weblog/servlet/trackback/449.jsp http://www.triggerfinger.org/weblog/servlet/trackback/2799.jsp http://www.triggerfinger.org/weblog/servlet/trackback/3947.jsp http://www.triggerfinger.org/weblog/servlet/trackback/5053.jsp http://www.triggerfinger.org/weblog/servlet/trackback/5324.jsp http://www.triggerfinger.org/weblog/servlet/trackback/5484.jsp http://www.triggerfinger.org/weblog/servlet/trackback/5519.jsp http://www.triggerfinger.org/weblog/servlet/trackback/5556.jsp
! I hadn't seen trackback spam before...
There's no standard comment API so I haven't fallen victim to that yet. Other bloggers have, but usually delete the comments ASAP... For comments, though, the simpler solution is probably to require an active user session (eg, session cookie accepted and returned from an earlier page). That can be programmatically done but it's harder. Parsing the comments for spam sign like email is, I think, inevitable in the long term. Well, that or requiring accounts to post comments.
sample comment spams are easy enough to find. Google for "comments movable cialis" ;) Here's one:
http://patch.stanford.edu/MT/mt-comments.cgi?entry_id=4
- --j.