Re: [SURBL-Discuss] RE: (1) Another Possible FP, and (2) header p arsing issues

16 Aug 2004


      "Jeff Chan" jeffc@surbl.org wrote:
...
Or how about an authenticated spam recipient address at Joe's.
In other words a place to mail spam into Joe's system.
Currently my filter can check data from two sources:
1) individual mail image files on disk and
2) messages on a remote POP3 mailbox.
(Extending it to the mailbox archive format wouldn't be too difficult)
Using 1) I currently accept submissions from third parties forwarded as mail
attachments (Content-Type: message/rfc822). I first drag them to a folder
and then run the filter on it.
For large numbers in real time, I would have to automate that part and
verify that the senders are trusted.
If there was demand for submission by attachment, I could write the
necessary code to extract attachments from mails received in special
mailboxes and run the scanner on them.
"Rik van Riel" riel@surriel.com wrote:
...
On Sun, 15 Aug 2004, Joe Wein wrote:
...
If you make your spamtrap mailboxes accessible to me,
I could automatically parse any number of them as long
as I can get POP3 access from here.
That would work, as long as you're willing to suck down
about 1GB of spam per day.  I already have my spam in a
news spool, so it can just be sucked down...
No POP3 of course, since I'm not aware of any mail software
that scales to mailboxes with over 100k pieces of mail a day.
Hi Rik,
thanks for your offer :-)
I won't be able to do much before September, as I'm about to go on vacations
later this week. Initially I was interested in a small enough subset that I
can handle as is, but I can also see a lot of potential in processing the
whole data set in real time. I think I can extend my filter to cope with the
kind of volume you describe. I'll have to rethink how I log and archive the
data, as I probably don't want to archive all spam (as I currently do), but
just the ones that caused new listings.
I will take a look at NNTP, to see how much new code it would take to
retrieve spams that way. Probably I could reuse quite a few bits from my
existing POP3 code.
As for performance, I can currently handle about 60K messages per day, but I
expect I could significantly speed that up. I currently check mails against
SBL+XBL, and the necessary DNS lookups take up most of the elapsed time, but
that wouldn't be absolutely necessary for "known bad" feed data.
...
One question though, how many GB/day of spamtrap
mail is Joe Wein able to handle ? ;)
I may only be getting one GB/day now, but in the long run
the only scalable solution will be to have the software
that analyses the spam available to others.
I agree, it will have to be running on multiple hosts in the long term,
otherwise it won't scale.
A million spams a day sounds interesting :-)
...
The only condition would be that the mail in question
isn't made available to others, since that would
expose the spamtrap addresses, breaking a promise
I made to the guy who pointed one of the spamtrap
domains at me...
No problem with that. The content of mails only becomes an issue when a
listing is challenged and I currently also do not reveal the recipient
address in those cases.
Cheers!
Joe
-- 
http://www.joewein.de/sw/jwSpamSpy/

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [SURBL-Discuss] RE: (1) Another Possible FP, and (2) header p arsing issues