Hi all
I found out about the surbl project some days ago and was all excited, but then a bit frustrated not to find contact information. Well, it seems like I did not looked carefull enough but then got the pointer to join this list from Wayne and well, here I am.
I don't know the culture of this list yet, but still take the "risk" to introduce myself instead of pasively reading a couple of days first. I hope not to offend people by doing so.
My name is Markus Zingg and I live in Switzerland - Europ. English is not my native language, I hope you bear with me if I sound odd or plain wrong :-). I'm developping all kind of software mostly useing 'C' for the last 20+ years.
I'm currently working on a project where we have developped an embedded email server. This is a 4x4x1" sized box which apart from being an SMTP, POP3 and WebMail server also contains a spam filter that in short and apart of some other methods works by extracting URI's from e-mails and matching them against a blacklist database. I'm happy to post the URL to a site describing the box in detail if you want me to.
The box - which does not have the luxury of having a superfast Pentium processor - must do the filtering in a very efficient fashion and therefore all of the firmware is implemented in 'C' with some parts even being (some risc processor) assembler.
In order to parse the e-mails and split them up MIME wise etc. I wrote a special parser which does all what's needed in one single pass (MIME parsing, content transfer decoding, decoding of hex and decimal encoded HTML areas etc. etc.) and parses the textual parts skiping attachements etc. The parser of course takes into acount wether it works on a html text or a plain text part and can't be fooled by the tricks the spammers used so far. It works extremly efficient and hence I thought it might be of intersted to the surbl project. As a side effect it can also detect dangerous attachements by looking at the filename extension and if configured to do so rename them on the fly. I figure though that this later part might does not go together well with SpamAssassin.
Since it was written for this embedded hardware there is some effort needed to make it of general use but I partially did that already in order to test the filter more efficiently with thousands of spam samples I collected over the years and of course also spam that currently is coming in.
I must admit that I don't know how SpamAssassin works in detail nor do I curently have a Linux based e-mail server setup. I do have however some PC's standing around that I could set up this way and I also used to work with Unix and Linux for several years in the past.
Apart from the fact that doing whatever possible to get the spam problem somewhat under control my interest would be to get as much spamvertised domains as possible. I understand that I could read live surbl data already and even add such functionality to the firmware but did not wanted to do this withouth first asking and also without offering help. I also have some aditional ideas on how to get more spamvertised domains and other issues, but I think this posting got a little long already :-)
I honestly have no idea if what I can offer is of any interest here. If not, please accept my apologies of having disturbed you.
Markus
On Wednesday, April 14, 2004, 4:29:36 PM, Markus Zingg wrote:
In order to parse the e-mails and split them up MIME wise etc. I wrote a special parser which does all what's needed in one single pass (MIME parsing, content transfer decoding, decoding of hex and decimal encoded HTML areas etc. etc.) and parses the textual parts skiping attachements etc. The parser of course takes into acount wether it works on a html text or a plain text part and can't be fooled by the tricks the spammers used so far. It works extremly efficient and hence I thought it might be of intersted to the surbl project. As a side effect it can also detect dangerous attachements by looking at the filename extension and if configured to do so rename them on the fly. I figure though that this later part might does not go together well with SpamAssassin.
Since it was written for this embedded hardware there is some effort needed to make it of general use but I partially did that already in order to test the filter more efficiently with thousands of spam samples I collected over the years and of course also spam that currently is coming in.
Hi Markus, Welcome to the SURBL community. :-) I think your URI parser could be of immense interest perhaps as something like a light, fast preprocessor to MTAs such as sendmail, etc. It could also be of potential use to SpamAssassin developers, though I'll admit to being less that fully aware of the actual development procedures of that large open source effort. (I am essentially a spectator on the sidelines of that sport.) You can find the SpamAssassin mailing lists at:
http://www.spamassassin.org/lists.html
What I would suggest doing is to publish the parser at an open source haven such as SourceForge.net so that people can see, get, comment, improve, update, extend, incorporate externally, etc. it.
I must admit that I don't know how SpamAssassin works in detail nor do I curently have a Linux based e-mail server setup. I do have however some PC's standing around that I could set up this way and I also used to work with Unix and Linux for several years in the past.
If you can follow FAQs and wikis, you should be able to do a fresh install of Linux/BSD and SpamAssassin and MTA of your choice fairly painlessly. It may be a useful way for you to see how the pieces fit together if you're so inclined. Many people find that setting up a personal box is a useful learning experience. It can also then be a very standard test platform.
Apart from the fact that doing whatever possible to get the spam problem somewhat under control my interest would be to get as much spamvertised domains as possible. I understand that I could read live surbl data already and even add such functionality to the firmware but did not wanted to do this withouth first asking and also without offering help.
I'd encourage using URI RBLs such as SURBL for the source data. That gains you the advantage of many networks of live data sources. For example, sc.surbl.org is highly dynamic, perhaps too much so currently. :-)
RBLs are a good way to share this kind of information.
I also have some aditional ideas on how to get more spamvertised domains and other issues, but I think this posting got a little long already :-)
Please feel free to share them. We all benefit from the sharing of ideas.
Jeff C.