On Wednesday, April 14, 2004, 4:29:36 PM, Markus Zingg wrote:
In order to parse the e-mails and split them up MIME wise etc. I wrote a special parser which does all what's needed in one single pass (MIME parsing, content transfer decoding, decoding of hex and decimal encoded HTML areas etc. etc.) and parses the textual parts skiping attachements etc. The parser of course takes into acount wether it works on a html text or a plain text part and can't be fooled by the tricks the spammers used so far. It works extremly efficient and hence I thought it might be of intersted to the surbl project. As a side effect it can also detect dangerous attachements by looking at the filename extension and if configured to do so rename them on the fly. I figure though that this later part might does not go together well with SpamAssassin.
Since it was written for this embedded hardware there is some effort needed to make it of general use but I partially did that already in order to test the filter more efficiently with thousands of spam samples I collected over the years and of course also spam that currently is coming in.
Hi Markus, Welcome to the SURBL community. :-) I think your URI parser could be of immense interest perhaps as something like a light, fast preprocessor to MTAs such as sendmail, etc. It could also be of potential use to SpamAssassin developers, though I'll admit to being less that fully aware of the actual development procedures of that large open source effort. (I am essentially a spectator on the sidelines of that sport.) You can find the SpamAssassin mailing lists at:
http://www.spamassassin.org/lists.html
What I would suggest doing is to publish the parser at an open source haven such as SourceForge.net so that people can see, get, comment, improve, update, extend, incorporate externally, etc. it.
I must admit that I don't know how SpamAssassin works in detail nor do I curently have a Linux based e-mail server setup. I do have however some PC's standing around that I could set up this way and I also used to work with Unix and Linux for several years in the past.
If you can follow FAQs and wikis, you should be able to do a fresh install of Linux/BSD and SpamAssassin and MTA of your choice fairly painlessly. It may be a useful way for you to see how the pieces fit together if you're so inclined. Many people find that setting up a personal box is a useful learning experience. It can also then be a very standard test platform.
Apart from the fact that doing whatever possible to get the spam problem somewhat under control my interest would be to get as much spamvertised domains as possible. I understand that I could read live surbl data already and even add such functionality to the firmware but did not wanted to do this withouth first asking and also without offering help.
I'd encourage using URI RBLs such as SURBL for the source data. That gains you the advantage of many networks of live data sources. For example, sc.surbl.org is highly dynamic, perhaps too much so currently. :-)
RBLs are a good way to share this kind of information.
I also have some aditional ideas on how to get more spamvertised domains and other issues, but I think this posting got a little long already :-)
Please feel free to share them. We all benefit from the sharing of ideas.
Jeff C.