Jeff Chan wrote:
I'd be a little concerned about the new DNS queries it would generate.
If load on the servers is a concern there are things that can be done:
* Cache lookups for the duration of the session * Only look up the main document URL, not src= URLs * Allow a URL whitelist... bookmarked URLs * Skip lookups for HTTP servers on the local subnet or on My Computer
etc.
Ideas welcome.
On Monday, February 13, 2006, 1:38:39 PM, Matthew Eerde wrote:
Jeff Chan wrote:
I'd be a little concerned about the new DNS queries it would generate.
If load on the servers is a concern there are things that can be done:
- Cache lookups for the duration of the session
- Only look up the main document URL, not src= URLs
- Allow a URL whitelist... bookmarked URLs
- Skip lookups for HTTP servers on the local subnet or on My Computer
All of those should be considered. Caching helps a lot and is highly recommended.
The whitelist that SpamAssassin took a snapshot of is a reasonable starting point:
http://spamassassin.apache.org/full/3.1.x/dist/rules/25_uribl.cf
IIRC it covered about half the whitelist hits at the time. IOW not checking those sites reduced by about 50% the checks of mostly whitehat sites; a desirable result in reducing unnecessary nameserver queries.
Many web pages have a w3.org URI as a reference for html, etc. There's no point in checking it many billions of times per day for example.
Cheers,
Jeff C. -- Don't harm innocent bystanders.