Re: [SURBL-Discuss] RFC: SURBL inclusion policy

28 Sep 2004


      Jeff Chan wrote to SURBL Discuss:
...
Heh, when I said "normal", statisticians jumped all over that.
:-)
...
Turns out the distributions may be more like Zipfian.  Zipf curves
have most of the data concentrated in a small amount of the curve
(e.g., young domains) and a small amount of the data in a larger part
of the curve (e.g., old domains).  I hope I'm explaining that
correctly.
That said, if you found some numerical heuristics that fit
the data well, that's great!
Yup, my function seems to fit quite nicely to the data I had at the
time. However, I do plan to work on the scoring in more detail. GetURI
is currently in a huge growth spurt with the advent of different
relevant tests, and finally getting up to speed with what people are
already doing to classify domains. Once that settles down a bit, I'll
probably look more closely at scoring.  Right now, though, it is
definitely quite a useful metric at the extremes (top/bottom of output).
It's weak in the middle ground, but, then again, we all know the middle
ground is damned hard enough for humans.  :-)
- Ryan
-- 
   Ryan Thompson ryan@sasknow.com

   SaskNow Technologies - http://www.sasknow.com
   901-1st Avenue North - Saskatoon, SK - S7K 1Y4

         Tel: 306-664-3600   Fax: 306-244-7037   Saskatoon
   Toll-Free: 877-727-5669     (877-SASKNOW)     North America

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [SURBL-Discuss] RFC: SURBL inclusion policy