I'm trying to find a good way to get the alexa.com data directly off of their web site. Their sub-categories say they can be listed by popularity, but the give some very weird mixes of rankings.
Another interesting thing is to try searching Google.com using the following:
site:alexa.com "traffic rank for" "related info"
and
site:alexa.com "traffic rank for" "related info" business
(or add category or other keyword to the end)
The actual domain is embedded in the page titles. But, without some automation, it would still be a lot of work to screen-scrape harvest these and the results are still rather randomly ordered.
ANOTHER GOOD RESOURCE:
http://www.port80software.com/surveys/top1000webservers/
This was a survey of large company's web servers. If you look at the source for these pages, it would be easy to parse the domains from these mere 12 pages. But this list might not be any better then the lists you have already checked against in the past.
I'm still no where near to a list as large and valuable as the Alexa top 20,000 would have been. I'm still looking :)
Rob McEwen