On Tuesday, April 19, 2005, 6:43:56 AM, Alex Broens wrote:
Guys,
Jeff's little XtraSmall (CBL Data) lust, ehhhmm, list seems to be catching quite a bit of trash.
give it a try:
urirhsbl URIBL_XS_SURBL xs.surbl.org. A 2 body URIBL_XS_SURBL eval:check_uridnsbl('URIBL_XS_SURBL') describe URIBL_XS_SURBL URL listed in XS SURBL - TEsting tflags URIBL_XS_SURBL net score URIBL_XS_SURBL 1.5
Score to your taste !!!!!
Pretty sure Jeff is anxious to get FP reports.
h2h
Alex
Thanks much Alex!
BTW, the list is about 1k records at the current levels, and it may be a little misleading to talk about 100 new records, because the total records in each category are greater:
Without any processing the current list has about 9k records:
8781 8781 111633 cbl-domains.all
Taking the 97th percentile of volume-ranked hits gives:
565 565 8416 cbl-domains.percentiled
The intersection of all with existing SURBLs is:
906 906 13459 cbl-domains.surbl
And the intersection of the percentiled and SURBL hits is
991 991 14817 cbl-domains.afterwhitelist
1k from 9k may seem like we're losing a lot, but the distributions look Zipfian: A few records get many hits and many records get a few hits, so there's a lot of "noise" down in the "few hits" range which may not be very useable. And even at this conservative setting, we're getting 97 percent of the CBL URI trap hits by volume, which can't be too bad.
Cheers,
Jeff C. -- "If it appears in hams, then don't list it."