whether I was seeing that with 0.10 though. I've seen cases where one message is causing 20 or more lookings for the "same" dns record.
I think I've worked out what is happening. Basically each different variation of a subdomain URL found in a message is causing a seperate lookup, even though the base domains that are actually being looked up are the same. For example I made a test message that looked like this:
http://serbserb.testdomain.co.nz/blah http://sebserbr.testdomain.co.nz/blah http://bsertbse.testdomain.co.nz/blah http://srtnsrtn.testdomain.co.nz/blah http://nrtnsrtn.testdomain.co.nz/blah http://saerbsee.testdomain.co.nz/blah http://rtndrtsn.testdomain.co.nz/blah http://nrtndrtn.testdomain.co.nz/blah http://sdfgserg.testdomain.co.nz/blah http://bcvcvbcx.testdomain.co.nz/blah http://ergsergh.testdomain.co.nz/blah http://qwertybe.testdomain.co.nz/blah http://lphtrhtr.testdomain.co.nz/blah http://bxdfbgnf.testdomain.co.nz/blah http://ergerger.testdomain.co.nz/blah http://cbxcvbxc.testdomain.co.nz/blah http://tyjftyjt.testdomain.co.nz/blah http://awefawfe.testdomain.co.nz/blah http://awefawef.testdomain.co.nz/blah http://awefawef.testdomain.co.nz/blah
Where there is a randomized subdomain in front of the actual domain. Many spams with lots of image links (ones selling printer cartridges, etc etc) effectively do this. (Each URL refers to a randomized subdomain)
Each URL above generated a dns lookup for testdomain.co.nz.sc.surbl.org and co.nz.sc.surbl.org, so a total of 40 dns lookups just for the sc list. I'm also using ws and be lists too, so thats a total of 120 dns lookups generated by an email with 20 randomized URLs :(
Luckily local dns caching largely offsets the problem but it would be good to avoid in the first place. Somehow as each URL is stripped down, a list of stripped names needs to be created with duplicates removed before doing the DNS queries.... extra coding I guess...
I can add something that will cache on a per test basis the results from the queries so the above scenario should be knocked down to just 3 queries instead of 120. I have been a little hesitant to cache misses since I could see where a miss could become a hit later on, but since I would only be caching per test this shouldn't be an issue.
--eric
Regards, Simon