Jeff Chan wrote to SURBL Discuss:
Heh, when I said "normal", statisticians jumped all over that.
:-)
Turns out the distributions may be more like Zipfian. Zipf curves have most of the data concentrated in a small amount of the curve (e.g., young domains) and a small amount of the data in a larger part of the curve (e.g., old domains). I hope I'm explaining that correctly.
That said, if you found some numerical heuristics that fit the data well, that's great!
Yup, my function seems to fit quite nicely to the data I had at the time. However, I do plan to work on the scoring in more detail. GetURI is currently in a huge growth spurt with the advent of different relevant tests, and finally getting up to speed with what people are already doing to classify domains. Once that settles down a bit, I'll probably look more closely at scoring. Right now, though, it is definitely quite a useful metric at the extremes (top/bottom of output). It's weak in the middle ground, but, then again, we all know the middle ground is damned hard enough for humans. :-)
- Ryan