Hi Jeff
I know that not all FP's are reported and there are probably no exact numbers, but it should give a good idea. Or am I wrong?
The FP reports are probably too few overall to be meaningful in terms of differentiating performance between lists. There just aren't that many, maybe a few a day on average.
Yes, but I wasn't thinking on differentiating between the lists, there are other results for. What I was thinking on was the number of FP's that exists on more than one list. This is very usefull information when combining lists. If almost no FP's do occur on more than one list (at the same time) requiring appearance on at least 2 lists would be a very safe one.
Good point. Anecdotally, FPs don't tend to appear on multiple lists very often, at least the FPs we've seen reported. This is unmeasured, just a subjective opinion. If we had some of the list data in combined form as I had proposed then we could test it better. I suppose I could just do it. ;-)
I f the reported one's are very rare, this would probably even more the case for the not reported one's. If there's a FP the chance for being reported will grow if on more than one list.
Mmm the combined lists just have to be available to someone with a big ham corpus, to test it.
Personaly knowing the results for "at least 2" or "at least 3" , would be nice. It also would be nice to know how those combination would result inside : http://www.surbl.org/permuted-hits.out.txt
Alain