Here's some additional info about the PJ data. Raymond is now processing about 300k spams per day and feeding them to Joe Wein for processing. This has increased the spam detection for PJ:
SpamAssassin tag hits: (top 100) #1 103430 URIBL_WS_SURBL #2 101346 URIBL_PJ_SURBL #3 91324 BAYES_99 #4 90939 URIBL_SBL #5 85476 RCVD_IN_BL_SPAMCOP_NET #6 85092 URIBL_OB_SURBL #7 81798 HTML_MESSAGE #8 67027 URIBL_SC_SURBL #9 56416 URIBL_AB_SURBL #10 48047 MIME_HTML_ONLY
Jeff C.
Here's a little more history about the processing Prolocation and Joe Wein are doing. Raymond is running copies of Joe's engine on three servers at his site. Each server being fed about 100k likely spams per day from multiple spamtraps and the abuse department of the ISP Raymond works at. This has created a very large increase in the amount of data that Joe sees in addition to his own systems and traps.
Prolocation's JW-processed messages are also fed into Joe's systems so that there is a consistent record as "evidence" of the spams. But the extracted URI data of both Joe and Raymond are combined at Prolocation before they get fed into WS. It's that combined data, all processed with Joe's engine (but at two different locations, Joe's and Raymond's) that we are thinking about breaking out into its own list (at least within multi).
I was unaware of the scale of things on Raymond's side, and they are significant upgrade to the capabilities of Joe's systems, in terms of spam volume, servers, people, etc.
The Prolocation and Joe Wein collaboration is also consistent since the same engine and standards are being applied to some large, independent sources of spam.
All of this leads me to think that the Prolocation and JW data are really a major operation of their own and deserve to have their own list.
By the way I'm proposing to call the list JP instead of PJ, since PJ is too similar to PH. The difference should result in less confusion.
Comments?
Jeff C.
Hi!
Here's a little more history about the processing Prolocation and Joe Wein are doing. Raymond is running copies of Joe's engine on three servers at his site. Each server being fed about 100k likely spams per day from multiple spamtraps and the abuse department of the ISP Raymond works at. This has created a very large increase in the amount of data that Joe sees in addition to his own systems and traps.
And besides that, Joe also gets a direct feed, we are tuning that feed still to see how many his ISP can handle ;) We allrady noticed that their servers somehow collapsed with my total feed., OOPS. We have traps on around 600 domains. And among those some pretty large ones.
By the way I'm proposing to call the list JP instead of PJ, since PJ is too similar to PH. The difference should result in less confusion.
Sounds fine by me.
Anyone ?
Bye, Raymond.
Jeff Chan wrote:
Here's a little more history about the processing Prolocation and Joe Wein are doing. Raymond is running copies of Joe's engine on three servers at his site. Each server being fed about 100k likely spams per day from multiple spamtraps and the abuse department of the ISP Raymond works at. This has created a very large increase in the amount of data that Joe sees in addition to his own systems and traps.
Prolocation's JW-processed messages are also fed into Joe's systems so that there is a consistent record as "evidence" of the spams. But the extracted URI data of both Joe and Raymond are combined at Prolocation before they get fed into WS. It's that combined data, all processed with Joe's engine (but at two different locations, Joe's and Raymond's) that we are thinking about breaking out into its own list (at least within multi).
I was unaware of the scale of things on Raymond's side, and they are significant upgrade to the capabilities of Joe's systems, in terms of spam volume, servers, people, etc.
The Prolocation and Joe Wein collaboration is also consistent since the same engine and standards are being applied to some large, independent sources of spam.
All of this leads me to think that the Prolocation and JW data are really a major operation of their own and deserve to have their own list.
By the way I'm proposing to call the list JP instead of PJ, since PJ is too similar to PH. The difference should result in less confusion.
Comments?
Just curious: how long does it then take from the moment a msg shows up and its recognized as spam till its available to "the public" in whichever zone?
Alex
Hi!
By the way I'm proposing to call the list JP instead of PJ, since PJ is too similar to PH. The difference should result in less confusion.
Comments?
Just curious: how long does it then take from the moment a msg shows up and its recognized as spam till its available to "the public" in whichever zone?
Best case, 15-20 minutes. Worst case, 1-6 hours, depending how much handwork is needed. But thats much better then for example the MS datafeed, that is in no way bad, its just something we notice, those are updated once every few days. I guess more or less depending on their available time. Some days the MS gets updated 2-3 times a day, other days no updates.
Especially with the:
abasedly366tabs.us abasia9773rneds.com akimbo6968tabs.us
alike domains, they get auto added now, so there wont be much delay in stopping those once we notify any new ones they start using.
If we somehow get better funding for the whole project this might be improved, but i dont think we do bad, in fact, i think we do amazingly well. And let me also say again, that with all the work of the people maintaining datasets and lists, this project would not exist, and even the people giving feedback, or report FP's. Thanks!!
Bye, Raymond.
----- Original Message ----- From: "Jeff Chan" jeffc@surbl.org
All of this leads me to think that the Prolocation and JW data are really a major operation of their own and deserve to have their own list.
By the way I'm proposing to call the list JP instead of PJ, since PJ is too similar to PH. The difference should result in less confusion.
Comments?
Thanks, Jeff, for the additional background information. I would agree that JP should be run as a separate list. This will allow us the flexibility to weight the different lists according to their accuracy and our individual levels of FP tolerance in our own environments.
Bill
Hi!
Comments?
Other ideas for PJ resp. JP: Joe + Raymond = JR ? Wein + Dijkxhoorn => vintage dike = VD ? <veg> Bye, Frank
Sure, but i am not the only one doing the works, outr whole abuse redartment is contributing to the project. So it would be more fair to not name it personally like that ;)
JP sounds fine to me.
Bye, Raymond.