I'm seeing a new spam varient that is clearly designed to get past SURBL. It is an HTML message that contains many (50~100) 'invisible' links; links that have no target text, just: <A href="http://garbage.sitename.tld"></A>
The intention is clear, they want to fill up the 20 'slots' of the spamcop_uri_limit with their junk links so the real "payload" URL can slip past unchecked. That's playing a statistical game, there's a 1 in 20 chance of the "payload" getting picked by the randomizer but that means that 95% slip by.
To add insult to injury, they're tossing in random "\r" (ASCII-CR) characters into the "payload" hostname to try to break spamassasin's URI parsing.
Is it time to create rules to penalize large numbers of 'invisible' links?
The one thing that has me worried is that people may just start cranking up the spamcop_uri_limit value to do a brute-force response to this trash (or have a simple-minded client that doesn't have that kind of limit). This will add an ever-increasing load on the SURBL dns servers. I'm already seeing a steady-state average of 130 queries/second against my two servers (with spikes in the 150~175) range. The trend has been a steady increase (passed the 100 Q/S mark last fall).
David B Funk wrote:
I'm seeing a new spam varient that is clearly designed to get past SURBL. It is an HTML message that contains many (50~100) 'invisible' links; links that have no target text, just: <A href="http://garbage.sitename.tld"></A>
Is it time to create rules to penalize large numbers of 'invisible' links?
it would also be good to discard pointless links before querying surbl's, not sure how easy that is going to be to code though
On Tuesday, February 22, 2005, 6:19:06 AM, Robert Brooks wrote:
David B Funk wrote:
I'm seeing a new spam varient that is clearly designed to get past SURBL. It is an HTML message that contains many (50~100) 'invisible' links; links that have no target text, just: <A href="http://garbage.sitename.tld"></A>
Is it time to create rules to penalize large numbers of 'invisible' links?
it would also be good to discard pointless links before querying surbl's, not sure how easy that is going to be to code though
Yes, there is a SpamAssassin bugzilla with a feature request to ignore unclickable URIs:
http://bugzilla.spamassassin.org/show_bug.cgi?id=3976
Jeff C. -- "If it appears in hams, then don't list it."
On Tue, 22 Feb 2005 04:35:51 -0600 (CST), David B Funk dbfunk@engineering.uiowa.edu wrote:
I'm seeing a new spam varient that is clearly designed to get past SURBL. It is an HTML message that contains many (50~100) 'invisible' links; links that have no target text, just: <A href="http://garbage.sitename.tld"></A>
The intention is clear, they want to fill up the 20 'slots' of the spamcop_uri_limit with their junk links so the real "payload" URL can slip past unchecked. That's playing a statistical game, there's a 1 in 20 chance of the "payload" getting picked by the randomizer but that means that 95% slip by.
To add insult to injury, they're tossing in random "\r" (ASCII-CR) characters into the "payload" hostname to try to break spamassasin's URI parsing.
Because of all these games that are played to break the parser, I discussed an idea a while back on the SpamCop newsgroups that looked at using Java (or some other API, maybe with Internet Explorer) to render a spam's HTML into a virtual page and then scan its Document Objects (post HTML parsing) one at a time for links. It's similar to what a user would "see" in a browser.
I've a hunch that "null" links, strange parsing, etc. will be handled correctly by the DOM parser for HTML, but I've never done any tests for lack of time. Java API could be called under linux, but IE's? Just an idea... I'm sure the spammers could figure out how to get around that method, too. But the trick is, their HTML still has to show up correctly to the user for the spam to work.
"David B Funk" dbfunk@engineering.uiowa.edu wrote:
I'm seeing a new spam varient that is clearly designed to get past SURBL. It is an HTML message that contains many (50~100) 'invisible' links; links that have no target text, just: <A href="http://garbage.sitename.tld"></A>
In my spamfilter I check for this pattern and penalise any mail for including <a href=...></a> with no anchor text (you have to be careful with the parsing though, so as not to penalise <a name="URI"></a> which is legit).
Also quite common is to have a single non-alphabetic character as the anchor text, e.g
<a href="URI">'</a> <a href="URI">.</a>
etc.
To add insult to injury, they're tossing in random "\r" (ASCII-CR) characters into the "payload" hostname to try to break spamassasin's URI parsing.
I strip out any CR/LF characters between the opening and closing double quote of a <a href=...> URL.
The next update of jwSpamSpy for Windows will query SURBL, which means it's coming full circle, since it is the tool that actually extracts and provides much of the JP domain data feed of SURBL :-)
Joe Wein
On Wednesday, February 23, 2005, 3:51:08 AM, Joe Wein wrote:
The next update of jwSpamSpy for Windows will query SURBL, which means it's coming full circle, since it is the tool that actually extracts and provides much of the JP domain data feed of SURBL :-)
Hi Joe, While it's nice that you want to build SURBLs into jwSpamSpy, we somewhat prefer that message processing be done in mail servers instead of mail clients particularly in order to minimize name server hits.
Jeff C. -- "If it appears in hams, then don't list it."
On Wednesday, February 23, 2005, 8:03:54 AM, Frank Ellermann wrote:
Jeff Chan wrote:
we somewhat prefer that message processing be done in mail servers instead of mail clients particularly in order to minimize name server hits.
Are DNS caches used by MTAs "better" than other DNS caches ?
Yes, because they see more mail on a single server and therefore should have higher cache hit rates.
Jeff C. -- "If it appears in hams, then don't list it."
Jeff Chan wrote:
Are DNS caches used by MTAs "better" than other DNS caches ?
Yes, because they see more mail on a single server and therefore should have higher cache hit rates.
I use the DNS server(s) assigned by my ISP(s), the MX(s) of my ISP(s) could use the same DNS servers. Maybe not if it is a very big ISP like T-Online, but for claranet.de they're probably the same servers. Bye, Frank
Jeff Chan wrote:
On Wednesday, February 23, 2005, 8:03:54 AM, Frank Ellermann wrote:
...
Are DNS caches used by MTAs "better" than other DNS caches ?
Yes, because they see more mail on a single server and therefore should have higher cache hit rates.
Also, many spams are sent in a single message to many recipients. If check is done at mail server, only one query per URL will be done. At the same time, huge mail servers have a cache DNS running on the same machine - so faster queries.
Joe
"Jeff Chan" jeffc@surbl.org wrote
Hi Joe, While it's nice that you want to build SURBLs into jwSpamSpy, we somewhat prefer that message processing be done in mail servers instead of mail clients particularly in order to minimize name server hits.
Hi Jeff,
I appreciate those concerns, but I have tried to address them in my client design as much as possible, to minimize any impact on SURBL compared to a server-based approach:
1) I only use SURBL if I can't verify an email as ham or spam using any other methods, such as sender whitelists, local domain blacklists, SBL records, ratware signatures, etc. jwSpamSpy is capable of detecting and tracking most of the pill / porn / warez domains etc. without having to resort to SURBL -- after all is the engine that provides a lot of the SURBL data :-)
2) I keep an extensive local whitelist of domains (several 1000s) which are never externally queried
3) I perform local DNS caching, which will eliminate a lot of duplicate queries on the wire
4) After that I go through the DNS server of the provider which will cache data; the client never directly connects to multi.surbl.org.
5) By default my filter checks for new mail every 10 minutes, less than the 15 minute TTL on SURBL. It's quasi-realtime. Therefore there should be no major time lag between delivery to the ISP mailserver and the SURBL query issued by the client, which I think was your main concern with a client-based approach. My mail polling interval is conveniently shorter than the SURBL TTL :-)
Having said that, my long-term plan is to also to offer a server-based solution that could run on Linux and other platforms. The existing client already supports multiple pop accounts on the same box, which obviously all go through the same DNS caching, etc. as they would on a server-based version.
Joe
On Wednesday, February 23, 2005, 5:55:19 PM, Joe Wein wrote:
"Jeff Chan" jeffc@surbl.org wrote
Hi Joe, While it's nice that you want to build SURBLs into jwSpamSpy, we somewhat prefer that message processing be done in mail servers instead of mail clients particularly in order to minimize name server hits.
Hi Jeff,
I appreciate those concerns, but I have tried to address them in my client design as much as possible, to minimize any impact on SURBL compared to a server-based approach:
- I only use SURBL if I can't verify an email as ham or spam using any
other methods, such as sender whitelists, local domain blacklists, SBL records, ratware signatures, etc. jwSpamSpy is capable of detecting and tracking most of the pill / porn / warez domains etc. without having to resort to SURBL -- after all is the engine that provides a lot of the SURBL data :-)
- I keep an extensive local whitelist of domains (several 1000s) which are
never externally queried
- I perform local DNS caching, which will eliminate a lot of duplicate
queries on the wire
- After that I go through the DNS server of the provider which will cache
data; the client never directly connects to multi.surbl.org.
- By default my filter checks for new mail every 10 minutes, less than the
15 minute TTL on SURBL. It's quasi-realtime. Therefore there should be no major time lag between delivery to the ISP mailserver and the SURBL query issued by the client, which I think was your main concern with a client-based approach. My mail polling interval is conveniently shorter than the SURBL TTL :-)
Having said that, my long-term plan is to also to offer a server-based solution that could run on Linux and other platforms. The existing client already supports multiple pop accounts on the same box, which obviously all go through the same DNS caching, etc. as they would on a server-based version.
Joe
Thanks Joe, it all sounds pretty reasonable to me.
Jeff C. -- "If it appears in hams, then don't list it."
Speaking of anti-SURBL tactics, I got this turdlet today (snippet of HTML email below):
<DIV>We are giving out Free Import / Export / Wholesales/ Distributers / Retailers Contact Database</DIV> <DIV> </DIV> <DIV>If You interested Pls get at Following URL</DIV> <DIV> </DIV> <DIV><A onmouseover="window.status='http://www.impexp-data.com';return true;" onmouseout="window.status=' ';return true;" href="http://indigisys.com/chawla1/open.htm" target=_blank>Business = Database</A> </DIV> <DIV> </DIV> <DIV>Free Business / Marketing Tools ( Free SMS to All over world Unl= imited ) </DIV> <DIV><A onmouseover="window.status='http://www.impexp-data.com/sms';return true;" onmouseout="window.status=' ';return true;" href="http://indigisys.com/chawla1/open.htm" target=_blank>FREE SMS = Tools </A></DIV>
It *looks* like whoever owns indigisys.com wants to hide the fact that they're actually indigisys.com by pretending to be impexp-data.com, which doesn't exist. Does SURBL's lookup code catch this?
On Monday, March 7, 2005, 9:07:37 PM, Steven Champeon wrote:
Speaking of anti-SURBL tactics, I got this turdlet today (snippet of HTML email below):
<DIV>We are giving out Free Import / Export / Wholesales/ Distributers / Retailers Contact Database</DIV> <DIV> </DIV> <DIV>If You interested Pls get at Following URL</DIV> <DIV> </DIV> <DIV><A onmouseover="window.status='http://www.impexp-data.com';return true;" onmouseout="window.status=' ';return true;" href="http://indigisys.com/chawla1/open.htm" target=_blank>Business = Database</A> </DIV> <DIV> </DIV> <DIV>Free Business / Marketing Tools ( Free SMS to All over world Unl= imited ) </DIV> <DIV><A onmouseover="window.status='http://www.impexp-data.com/sms';return true;" onmouseout="window.status=' ';return true;" href="http://indigisys.com/chawla1/open.htm" target=_blank>FREE SMS = Tools </A></DIV>
It *looks* like whoever owns indigisys.com wants to hide the fact that they're actually indigisys.com by pretending to be impexp-data.com, which doesn't exist. Does SURBL's lookup code catch this?
SpamAssassin 2.64 running SpamCopURI seems to check both domains:
debug: checking url: http://indigisys.com/chawla1/open.htm debug: returning cached data : indigisys.com.multi.surbl.org -> ARRAY(0x9351f4c) debug: Receieved match prefix: 127.0.0 debug: Receieved mask: 32 debug: no match
debug: checking url: http://www.impexp-data.com%27;return debug: returning cached data : impexp-data.com.multi.surbl.org -> ARRAY(0x9386f58) debug: Receieved match prefix: 127.0.0 debug: Receieved mask: 32
As does SpamAssassin 3.0.1:
debug: URIDNSBL: query for indigisys.com took 0 seconds to look up (multi.surbl.org.:indigisys.com) debug: URIDNSBL: query for impexp-data.com took 0 seconds to look up (multi.surbl.org.:impexp-data.com)
Those are the only SURBL applications I have easy access to, so I don't know how others may handle them. SpamAssassin does the right thing. :-)
Jeff C. -- "If it appears in hams, then don't list it."