Here's some additional info about the PJ data. Raymond is now
processing about 300k spams per day and feeding them to Joe
Wein for processing. This has increased the spam detection
for PJ:
SpamAssassin tag hits: (top 100)
#1 103430 URIBL_WS_SURBL
#2 101346 URIBL_PJ_SURBL
#3 91324 BAYES_99
#4 90939 URIBL_SBL
#5 85476 RCVD_IN_BL_SPAMCOP_NET
#6 85092 URIBL_OB_SURBL
#7 81798 HTML_MESSAGE
#8 67027 URIBL_SC_SURBL
#9 56416 URIBL_AB_SURBL
#10 48047 MIME_HTML_ONLY
Jeff C.
--
Jeff Chan
mailto:jeffc@surbl.org
http://www.surbl.org/
This is a list that MailPolice hosts and I have been running it for a few
hours and it has already flagged some phish and fraud e-mails. Here is some
info about the list: http://rhs.mailpolice.com/#rhsfraud
This is my configuration for SA 2.64 with the SpamCopURI plug-in:
uri MP_URI_RBL
eval:check_spamcop_uri_rbl('fraud.rhs.mailpolice.com','127.0.0.2')
describe MP_URI_RBL URI's domain appears in MailPolice fraud list
tflags MP_URI_RBL net
score MP_URI_RBL 2.0
And for SA 3.0 with the URIDNSBL plug-in:
urirhsbl URIBL_MP fraud.rhs.mailpolice.com. A
header URIBL_MP eval:check_uridnsbl('URIBL_MP')
describe URIBL_MP URI's domain appears in MailPolice fraud list
tflags URIBL_MP net
score URIBL_MP 2.0
Bill
As you know WS has data from several different sources:
1. Bill Stearns' sa-blacklist
2. Chris Santerre and the SARE Ninja's former BigEvil, MidEvil
and other new ones.
3. Joe Wein's jwSpamSpy traps
4. Raymond's Prolocation traps and manual list.
5. MailSecurity lists
and probably many others I'm not even aware of. So WS has
become a collection of many different data sources. In some
cases, such as for the jw data, my initial thought was to set it
up as a separate list, but it was somewhat easier to let them all
be added together. Raymond is currently feeding spamtrap data
into Joe's system also.
Raymond and I were looking at some of the data sources in WS and
their spam detection and false positive rates, and we tried an
experiment of checking his Prolocation spam data with Joe Wein's
to see what the results would be like. We called that list "PJ"
for "Prolocation and Joe" and found that the FP rate on one large
corpus was significantly lower than WS, while the spam detection
rate was approximately the same:
OVERALL% SPAM% HAM% S/O RANK SCORE NAME
2424443 2357143 67300 0.972 0.00 0.00 (all messages)
100.000 97.2241 2.7759 0.972 0.00 0.00 (all messages as %)
7.595 7.8122 0.0045 0.999 1.00 0.00 URIBL_SC_SURBL
76.754 78.9448 0.0178 1.000 0.80 0.00 URIBL_OB_SURBL
77.230 79.4340 0.0208 1.000 0.60 1.00 URIBL_PJ_SURBL
0.985 1.0126 0.0045 0.996 0.50 0.00 URIBL_AB_SURBL
82.119 84.4600 0.1367 0.998 0.40 0.00 URIBL_WS_SURBL
0.021 0.0216 0.0045 0.829 0.00 0.00 URIBL_PH_SURBL
(We removed FPs from both PJ and WS as a result of some of this
testing, so both should score relatively better now in terms of
FPs. The spam hit rates on SC and AB are low because this spam
corpus includes many old spams with URIs which would have rolled
off these lists. A test only on more recent days would show much
higher spam hit rates for SC and AB.)
In Raymond's own test of only spam hits, a later version of PJ
got higher detection rates than WS:
SpamAssassin tag hits: (edited to top 10)
#1 108958 BAYES_99
#2 87001 URIBL_SBL
#3 84709 URIBL_PJ_SURBL
#4 81455 HTML_MESSAGE
#5 78177 RCVD_IN_BL_SPAMCOP_NET
#6 75546 URIBL_OB_SURBL
#7 74892 URIBL_WS_SURBL
#8 64610 URIBL_SC_SURBL
#9 58190 URIBL_AB_SURBL
#10 54230 MIME_HTML_ONLY
Since WS has relatively low scores in SA 3, presumably due to
the relatively high FP rate:
> On Thu, Sep 02, 2004 at 08:09:17PM -0700, Jeff Chan wrote:
>> score URIBL_AB_SURBL 0 2.007 0 0.417
>> score URIBL_OB_SURBL 0 1.996 0 3.213
>> score URIBL_PH_SURBL 0 0.839 0 2.000
>> score URIBL_SC_SURBL 0 3.897 0 4.263
>> score URIBL_WS_SURBL 0 0.539 0 1.462
>>
>> So what do the columns above mean?
(Theo replied:)
> $ perldoc Mail::SpamAssassin::Conf
> [...]
> If four valid scores are listed, then the score that is used
> depends on how SpamAssassin is being used. The first score is used
> when both Bayes and network tests are disabled (score set 0). The
> second score is used when Bayes is disabled, but network tests are
> enabled (score set 1). The third score is used when Bayes is
> enabled and network tests are disabled (score set 2). The fourth
> score is used when Bayes is enabled and network tests are enabled
> (score set 3).
we thought it might be useful to make the PJ data available as
a separate list, at least within multi.surbl.org, the combined
SURBL. We'd like to get your comments on this.
We're also wondering whether the PJ data should be taken out of
WS, or left in, if we do make PJ a distinct list. There's not
much downside in leaving PJ in WS, aside from a somewhat larger
standalone WS list. On the other hand all of our lists are
currently standalone and not deliberate subsets in terms of
data sources. But I assume most people will use multi, for which
the difference is small either way.
By the way, please don't use PJ for production data yet, unless
you are rsyncing the zone files, in which case you can mirror PJ
locally now from the rsync servers for testing purposes if you
like. PJ's only being served up on a couple public servers for
our testing; we don't want to overload those servers. PJ is
not in multi currently. Note that PJ is only a test list now.
It may go away.
Please comment,
Jeff C.
Would appreciate if Dallas would allow me to mail him directly...
Apologize for the noise.
No other way to contact Dallas.
Thanks
Alex
-----------------
Sat 2004-09-18 21:30:38: From: alexb(a)MUNGED.CH
Sat 2004-09-18 21:30:38: To: dallase(a)nmgiMUNGED.com
Sat 2004-09-18 21:30:38: Subject: DoubleCheck
Sat 2004-09-18 21:30:38: Message-ID:
<6.1.0.6.2.20040918212951.0268d918(a)inet.alexb.ch>
Sat 2004-09-18 21:30:38: MX-record resolution of [nmgi.com] in progress
(DNS Server: 192.168.192.6)...
Sat 2004-09-18 21:30:39: P=010 D=nmgi.com TTL=(60) MX=[mail2.nmgi.com]
{209.218.125.110}
Sat 2004-09-18 21:30:39: P=005 D=nmgi.com TTL=(60) MX=[mail1.nmgi.com]
{67.67.32.200}
Sat 2004-09-18 21:30:39: Attempting MX: P=005 D=nmgi.com TTL=(60)
MX=[mail1.nmgi.com] {67.67.32.200}
Sat 2004-09-18 21:30:39: Attempting SMTP connection to [67.67.32.200 : 25]
Sat 2004-09-18 21:30:39: Waiting for socket connection...
Sat 2004-09-18 21:30:39: Socket connection established (192.168.192.6 :
3407 -> 67.67.32.200 : 25)
Sat 2004-09-18 21:30:39: Waiting for protocol initiation...
Sat 2004-09-18 21:30:40: <-- 220 mailgw.nmgi.com ESMTP
Sat 2004-09-18 21:30:40: --> EHLO alexb.ch
Sat 2004-09-18 21:30:40: <-- 250-DoubleCheck Supports
Sat 2004-09-18 21:30:40: <-- 250-PIPELINING
Sat 2004-09-18 21:30:40: <-- 250-8BITMIME
Sat 2004-09-18 21:30:40: <-- 250 AUTH LOGIN PLAIN
Sat 2004-09-18 21:30:40: --> MAIL From:<alexb(a)MUNGED.CH>
Sat 2004-09-18 21:30:40: <-- 250 ok
Sat 2004-09-18 21:30:40: --> RCPT To:<dallase(a)nmgi.MUNGEDcom>
Sat 2004-09-18 21:30:40: <-- 554 sorry, your envelope sender is in my
badmailfrom list (#5.7.1)
Sat 2004-09-18 21:30:40: SMTP session terminated (Bytes in/out: 181/72)
Sat 2004-09-18 21:30:40: ----------
On Friday, September 17, 2004, 4:24:37 PM, Bill Landry wrote:
> This is a list that MailPolice hosts and I have been running it for a few
> hours and it has already flagged some phish and fraud e-mails. Here is some
> info about the list: http://rhs.mailpolice.com/#rhsfraud
> This is my configuration for SA 2.64 with the SpamCopURI plug-in:
> uri MP_URI_RBL
> eval:check_spamcop_uri_rbl('fraud.rhs.mailpolice.com','127.0.0.2')
> describe MP_URI_RBL URI's domain appears in MailPolice fraud list
> tflags MP_URI_RBL net
> score MP_URI_RBL 2.0
> And for SA 3.0 with the URIDNSBL plug-in:
> urirhsbl URIBL_MP fraud.rhs.mailpolice.com. A
> header URIBL_MP eval:check_uridnsbl('URIBL_MP')
> describe URIBL_MP URI's domain appears in MailPolice fraud list
> tflags URIBL_MP net
> score URIBL_MP 2.0
> Bill
Thanks for finding that additional phishing data Bill!
It may be worth noting that we're working on including these
into PH inside multi.surbl.org, and that we have 15 times more
name servers than mailpolice.com does. ;-)
Jeff C.
--
Jeff Chan
mailto:jeffc@surbl.org
http://www.surbl.org/
I have a customer who keeps tripping catholicexchangeMUNGED.com
Can we get this removed from WS SURBL?
Frederic Tarasevicius
Internet Information Services, Inc.
http://www.i-is.com/
810-794-4400
mailto:info@i-is.com
Jeff & Co.
Just scored this a while ago.
* 3.0 WS_URI_RBL URI's domain appears in ws database at ws.surbl.org
* [www.paypal.com [in URI RBL at multi.surbl.org]
In the original msg there's a a weird charachter after the .com which I
cannot explain.
maybe a space?
could you pls check where this may be showing up (no in my local zone :)
thanks
Alex
Only one I have is this:
www1-paypalSUPERMUNGEDFORYOURPROTECTION.com
Sorry :)
www1-paypal.com
>-----Original Message-----
>From: Alex Broens [mailto:surbl@alexb.ch]
>Sent: Friday, September 17, 2004 10:22 AM
>To: SURBL Discussion list
>Subject: [SURBL-Discuss] Pls check: paypal
>
>
>Jeff & Co.
>
>Just scored this a while ago.
>
> * 3.0 WS_URI_RBL URI's domain appears in ws database
>at ws.surbl.org
> * [www.paypal.com [in URI RBL at multi.surbl.org]
>
>In the original msg there's a a weird charachter after the
>.com which I
>cannot explain.
>
>maybe a space?
>
>could you pls check where this may be showing up (no in my
>local zone :)
>
>thanks
>
>Alex
>
>
>_______________________________________________
>Discuss mailing list
>Discuss(a)lists.surbl.org
>http://lists.surbl.org/mailman/listinfo/discuss
>