Good afternoon, all,
On Wed, 5 May 2004, Jose-Marcio.Martins@ensmp.fr wrote:
Raymond Dijkxhoorn wrote:
Take a look at this SPAM :
http://www.ensmp.fr/~martins/Prozac
Mainly, check the source.
The problem is that it comes with many, many URLs. At the beginning, there are URLs needed by the SPAM itself. After, it puts many URLs with font size equals to 1. Most of these last domains aren't spam... 8-)
I call this spamchaff - useless links thrown in just to throw off spam fighters like ourselves. Some of those hyperlinks are simply [a href="...."][/a] with no target in the middle. I specifically check for those "empty links" before even starting the normal process of extracting urls. Another common one is 3 link pairs, each surrounding a single character of punctuation.
Who cares, its picked up anyway, BIGEVIL_URI_RBL and WS_URI_RBL! :)
Scripts extracting URLs to insert into blacklist **should care** with extracted URLs.
Agreed, and this is why I've spent a _lot_ of time working on the scripts that extract URL's so that I can see these and they won't hit the list at all. It's also why I'm finding I have less and less patience with other people's scripts as these slip into their submissions. Cheers, - Bill
--------------------------------------------------------------------------- The web page you seek cannot be found here: countless others await (Courtesy of John Sage jsage@finchhaven.com) -------------------------------------------------------------------------- William Stearns (wstearns@pobox.com). Mason, Buildkernel, freedups, p0f, rsync-backup, ssh-keyinstall, dns-check, more at: http://www.stearns.org --------------------------------------------------------------------------