[SURBL-Discuss] RFC: SURBL inclusion policy

Ryan Thompson ryan at sasknow.com
Tue Sep 28 06:18:42 CEST 2004


Jeff Chan wrote to SURBL Discuss:

> Heh, when I said "normal", statisticians jumped all over that.

:-)

> Turns out the distributions may be more like Zipfian.  Zipf curves
> have most of the data concentrated in a small amount of the curve
> (e.g., young domains) and a small amount of the data in a larger part
> of the curve (e.g., old domains).  I hope I'm explaining that
> correctly.
>
> That said, if you found some numerical heuristics that fit
> the data well, that's great!

Yup, my function seems to fit quite nicely to the data I had at the
time. However, I do plan to work on the scoring in more detail. GetURI
is currently in a huge growth spurt with the advent of different
relevant tests, and finally getting up to speed with what people are
already doing to classify domains. Once that settles down a bit, I'll
probably look more closely at scoring.  Right now, though, it is
definitely quite a useful metric at the extremes (top/bottom of output).
It's weak in the middle ground, but, then again, we all know the middle
ground is damned hard enough for humans.  :-)

- Ryan

-- 
   Ryan Thompson <ryan at sasknow.com>

   SaskNow Technologies - http://www.sasknow.com
   901-1st Avenue North - Saskatoon, SK - S7K 1Y4

         Tel: 306-664-3600   Fax: 306-244-7037   Saskatoon
   Toll-Free: 877-727-5669     (877-SASKNOW)     North America


More information about the Discuss mailing list