(I've moved this message from the SA list to the SURBL list where it's more relevant and wont get lost in the noise....)
At 06:14 22/04/2004, Dallas L. Engelken wrote:
I have just released SpamCopURI version 0.11. This fixes a
few bugs
that had been reported and adds open redirect resolution.
[...]
Just installed it... Can you tell me what is up with this.
@400000004086b7c400ac051c debug: Query failed for thegolfchannel.com.ws.surbl.org @400000004086b7c400ad2244 debug: querying for www.thegolfchannel.com.ws.surbl.org @400000004086b7c400ad262c @400000004086b7c400d251cc debug: Query failed for www.thegolfchannel.com.ws.surbl.org @400000004086b7c400d74b3c debug: querying for thegolfchannel.com.ws.surbl.org @400000004086b7c400d7530c @400000004086b7c400f8d144 debug: Query failed for thegolfchannel.com.ws.surbl.org @400000004086b7c400f9ea84 debug: querying for www.thegolfchannel.com.ws.surbl.org @400000004086b7c400f9f254 @400000004086b7c4011e6e2c debug: Query failed for www.thegolfchannel.com.ws.surbl.org @400000004086b7c40123d8e4 debug: querying for thegolfchannel.com.ws.surbl.org @400000004086b7c40123e0b4 @400000004086b7c4014c5814 debug: Query failed for thegolfchannel.com.ws.surbl.org @400000004086b7c4014d7924 debug: querying for www.thegolfchannel.com.ws.surbl.org @400000004086b7c4014d7d0c @400000004086b7c401729524 debug: Query failed for www.thegolfchannel.com.ws.surbl.org @400000004086b7c401777724 debug: querying for thegolfchannel.com.ws.surbl.org @400000004086b7c401777ef4 @400000004086b7c401993f94 debug: Query failed for thegolfchannel.com.ws.surbl.org @400000004086b7c4019a648c debug: querying for www.thegolfchannel.com.ws.surbl.org @400000004086b7c4019a6c5c @400000004086b7c401bec124 debug: Query failed for www.thegolfchannel.com.ws.surbl.org @400000004086b7c401c3a324 debug: querying for thegolfchannel.com.ws.surbl.org
Like 20 some times it tried to query before it finally stopped. Does query failed actually mean 'failed' or there was no A record found? If I send a test from the command line on a message that contains a uri on both lists, it works fine.
[root@localhost service]# echo -e 'From: dallase\n\n<a href="http://8006hosting.com">click here</A>' | spamc ... * 3.0 SC_URI_RBL Contains a URL listed in the SC SURBL blocklist * 2.5 WS_URI_RBL Contains a URL listed in the WS SURBL blocklist ...
Do I need a new DNS::Resolver or is this normal behavior?
I'm seeing the same thing with SpamCopURI-0.12 as well, I don't remember whether I was seeing that with 0.10 though. I've seen cases where one message is causing 20 or more lookings for the "same" dns record.
I think I've worked out what is happening. Basically each different variation of a subdomain URL found in a message is causing a seperate lookup, even though the base domains that are actually being looked up are the same. For example I made a test message that looked like this:
http://serbserb.testdomain.co.nz/blah http://sebserbr.testdomain.co.nz/blah http://bsertbse.testdomain.co.nz/blah http://srtnsrtn.testdomain.co.nz/blah http://nrtnsrtn.testdomain.co.nz/blah http://saerbsee.testdomain.co.nz/blah http://rtndrtsn.testdomain.co.nz/blah http://nrtndrtn.testdomain.co.nz/blah http://sdfgserg.testdomain.co.nz/blah http://bcvcvbcx.testdomain.co.nz/blah http://ergsergh.testdomain.co.nz/blah http://qwertybe.testdomain.co.nz/blah http://lphtrhtr.testdomain.co.nz/blah http://bxdfbgnf.testdomain.co.nz/blah http://ergerger.testdomain.co.nz/blah http://cbxcvbxc.testdomain.co.nz/blah http://tyjftyjt.testdomain.co.nz/blah http://awefawfe.testdomain.co.nz/blah http://awefawef.testdomain.co.nz/blah http://awefawef.testdomain.co.nz/blah
Where there is a randomized subdomain in front of the actual domain. Many spams with lots of image links (ones selling printer cartridges, etc etc) effectively do this. (Each URL refers to a randomized subdomain)
Each URL above generated a dns lookup for testdomain.co.nz.sc.surbl.org and co.nz.sc.surbl.org, so a total of 40 dns lookups just for the sc list. I'm also using ws and be lists too, so thats a total of 120 dns lookups generated by an email with 20 randomized URLs :(
Luckily local dns caching largely offsets the problem but it would be good to avoid in the first place. Somehow as each URL is stripped down, a list of stripped names needs to be created with duplicates removed before doing the DNS queries.... extra coding I guess...
Regards, Simon
[I replied on the SA list also]
On Wednesday, April 21, 2004, 6:14:51 PM, Simon Byrnand wrote:
(I've moved this message from the SA list to the SURBL list where it's more relevant and wont get lost in the noise....)
At 06:14 22/04/2004, Dallas L. Engelken wrote:
I have just released SpamCopURI version 0.11. This fixes a
few bugs
that had been reported and adds open redirect resolution.
[...]
Just installed it... Can you tell me what is up with this.
@400000004086b7c400ac051c debug: Query failed for thegolfchannel.com.ws.surbl.org @400000004086b7c400ad2244 debug: querying for www.thegolfchannel.com.ws.surbl.org @400000004086b7c400ad262c
...
@400000004086b7c401bec124 debug: Query failed for www.thegolfchannel.com.ws.surbl.org @400000004086b7c401c3a324 debug: querying for thegolfchannel.com.ws.surbl.org
Like 20 some times it tried to query before it finally stopped. Does query failed actually mean 'failed' or there was no A record found? If I send a test from the command line on a message that contains a uri on both lists, it works fine.
[root@localhost service]# echo -e 'From: dallase\n\n<a href="http://8006hosting.com">click here</A>' | spamc ... * 3.0 SC_URI_RBL Contains a URL listed in the SC SURBL blocklist * 2.5 WS_URI_RBL Contains a URL listed in the WS SURBL blocklist ...
Do I need a new DNS::Resolver or is this normal behavior?
I'm seeing the same thing with SpamCopURI-0.12 as well, I don't remember whether I was seeing that with 0.10 though. I've seen cases where one message is causing 20 or more lookings for the "same" dns record.
I think I've worked out what is happening. Basically each different variation of a subdomain URL found in a message is causing a seperate lookup, even though the base domains that are actually being looked up are the same. For example I made a test message that looked like this:
http://serbserb.testdomain.co.nz/blah http://sebserbr.testdomain.co.nz/blah
...
http://awefawfe.testdomain.co.nz/blah http://awefawef.testdomain.co.nz/blah http://awefawef.testdomain.co.nz/blah
Where there is a randomized subdomain in front of the actual domain. Many spams with lots of image links (ones selling printer cartridges, etc etc) effectively do this. (Each URL refers to a randomized subdomain)
Each URL above generated a dns lookup for testdomain.co.nz.sc.surbl.org and co.nz.sc.surbl.org, so a total of 40 dns lookups just for the sc list. I'm also using ws and be lists too, so thats a total of 120 dns lookups generated by an email with 20 randomized URLs :(
Luckily local dns caching largely offsets the problem but it would be good to avoid in the first place. Somehow as each URL is stripped down, a list of stripped names needs to be created with duplicates removed before doing the DNS queries.... extra coding I guess...
Regards, Simon
Looks like it's probably normal failure to resolve an A record which means the domain is not on the list:
% nslookup thegolfchannel.com.ws.surbl.org
*** localhost.freeapp.net can't find thegolfchannel.com.ws.surbl.org: Non-existent host/domain
Let me ask Eric if there's a way he can eliminate duplicate DNS queries. Perhaps that went away when he deprecated the use of Storable in SpamCopURI.
Simon, you're right that DNS caching means this doesn't hurt much in terms of performance. (It may even be faster than trying to store these in SA to prevent duplication.)
Jeff C.
At 13:34 22/04/2004, Jeff Chan wrote:
Simon, you're right that DNS caching means this doesn't hurt much in terms of performance. (It may even be faster than trying to store these in SA to prevent duplication.)
For people with a caching DNS server on the same machine or ethernet perhaps, but there are people running SA on Windoze machines over dialup connections, (crazy but true ;) for them it will most definately not be faster :)
Regards, Simon
whether I was seeing that with 0.10 though. I've seen cases where one message is causing 20 or more lookings for the "same" dns record.
I think I've worked out what is happening. Basically each different variation of a subdomain URL found in a message is causing a seperate lookup, even though the base domains that are actually being looked up are the same. For example I made a test message that looked like this:
http://serbserb.testdomain.co.nz/blah http://sebserbr.testdomain.co.nz/blah http://bsertbse.testdomain.co.nz/blah http://srtnsrtn.testdomain.co.nz/blah http://nrtnsrtn.testdomain.co.nz/blah http://saerbsee.testdomain.co.nz/blah http://rtndrtsn.testdomain.co.nz/blah http://nrtndrtn.testdomain.co.nz/blah http://sdfgserg.testdomain.co.nz/blah http://bcvcvbcx.testdomain.co.nz/blah http://ergsergh.testdomain.co.nz/blah http://qwertybe.testdomain.co.nz/blah http://lphtrhtr.testdomain.co.nz/blah http://bxdfbgnf.testdomain.co.nz/blah http://ergerger.testdomain.co.nz/blah http://cbxcvbxc.testdomain.co.nz/blah http://tyjftyjt.testdomain.co.nz/blah http://awefawfe.testdomain.co.nz/blah http://awefawef.testdomain.co.nz/blah http://awefawef.testdomain.co.nz/blah
Where there is a randomized subdomain in front of the actual domain. Many spams with lots of image links (ones selling printer cartridges, etc etc) effectively do this. (Each URL refers to a randomized subdomain)
Each URL above generated a dns lookup for testdomain.co.nz.sc.surbl.org and co.nz.sc.surbl.org, so a total of 40 dns lookups just for the sc list. I'm also using ws and be lists too, so thats a total of 120 dns lookups generated by an email with 20 randomized URLs :(
Luckily local dns caching largely offsets the problem but it would be good to avoid in the first place. Somehow as each URL is stripped down, a list of stripped names needs to be created with duplicates removed before doing the DNS queries.... extra coding I guess...
I can add something that will cache on a per test basis the results from the queries so the above scenario should be knocked down to just 3 queries instead of 120. I have been a little hesitant to cache misses since I could see where a miss could become a hit later on, but since I would only be caching per test this shouldn't be an issue.
--eric
Regards, Simon
At 14:39 22/04/2004, Eric Kolve wrote:
I can add something that will cache on a per test basis the results from the queries so the above scenario should be knocked down to just 3 queries instead of 120. I have been a little hesitant to cache misses since I could see where a miss could become a hit later on, but since I would only be caching per test this shouldn't be an issue.
You mean 6 queries ? Assuming you still test both 2nd level and 3rd level possibilities seperately, as now.
And as far as caching goes, it shouldn't be a problem, because you just want to avoid doing a whole string of identical dns lookups - only cache identical lookups, and the caching should only last for one run of SA processing one message...(We assume that no new blacklist records appear in the middle of processing a specific message, or that if they do we don't care ;-)
Regards, Simon
On Thu, Apr 22, 2004 at 02:55:33PM +1200, Simon Byrnand wrote:
At 14:39 22/04/2004, Eric Kolve wrote:
I can add something that will cache on a per test basis the results from the queries so the above scenario should be knocked down to just 3 queries instead of 120. I have been a little hesitant to cache misses since I could see where a miss could become a hit later on, but since I would only be caching per test this shouldn't be an issue.
You mean 6 queries ? Assuming you still test both 2nd level and 3rd level possibilities seperately, as now.
I have pulled the ccTLD code from URIRBL to extract the 'registrar domain' so we would only do one query per host instead of two (2nd and 3rd level).
This should work fine since Jeff is doing the same thing on his side and will cut queries down quite a bit.
--eric
And as far as caching goes, it shouldn't be a problem, because you just want to avoid doing a whole string of identical dns lookups - only cache identical lookups, and the caching should only last for one run of SA processing one message...(We assume that no new blacklist records appear in the middle of processing a specific message, or that if they do we don't care ;-)
Regards, Simon
On Wednesday, April 21, 2004, 8:20:40 PM, Eric Kolve wrote:
On Thu, Apr 22, 2004 at 02:55:33PM +1200, Simon Byrnand wrote:
I have pulled the ccTLD code from URIRBL to extract the 'registrar domain' so we would only do one query per host instead of two (2nd and 3rd level).
This should work fine since Jeff is doing the same thing on his side and will cut queries down quite a bit.
Yep. Sounds good!
Jeff C.
On Wednesday, April 21, 2004, 7:55:33 PM, Simon Byrnand wrote:
At 14:39 22/04/2004, Eric Kolve wrote:
I can add something that will cache on a per test basis the results from the queries so the above scenario should be knocked down to just 3 queries instead of 120. I have been a little hesitant to cache misses since I could see where a miss could become a hit later on, but since I would only be caching per test this shouldn't be an issue.
You mean 6 queries ? Assuming you still test both 2nd level and 3rd level possibilities seperately, as now.
It sounds like Eric is changing SpamCopURI to test on the number of levels depending on ccTLD membership, which is probably fine, therefore it would do 3 queries.
And as far as caching goes, it shouldn't be a problem, because you just want to avoid doing a whole string of identical dns lookups - only cache identical lookups, and the caching should only last for one run of SA processing one message...(We assume that no new blacklist records appear in the middle of processing a specific message, or that if they do we don't care ;-)
Yes, it's not clear if "per test" means per message, but it would seem so. That too should be fine. I don't think we should worry too much about the boundary condition of the SURBL changing in the middle of message processing, which seems like it would be uncommon. Per-message caching is already a big help.
Jeff C.
At 15:27 22/04/2004, Jeff Chan wrote:
On Wednesday, April 21, 2004, 7:55:33 PM, Simon Byrnand wrote:
At 14:39 22/04/2004, Eric Kolve wrote:
I can add something that will cache on a per test basis the results from the queries so the above scenario should be knocked down to just 3 queries instead of 120. I have been a little hesitant to cache misses since I could see where a miss could become a hit later on, but since I would only be caching per test this shouldn't be an issue.
You mean 6 queries ? Assuming you still test both 2nd level and 3rd level possibilities seperately, as now.
It sounds like Eric is changing SpamCopURI to test on the number of levels depending on ccTLD membership, which is probably fine, therefore it would do 3 queries.
A thought just struck me...
At the moment it is not entirely foolproof to figure out how many layers there are on a given ccTLD domain name, and at surbl.org in your backend processing system you have a list which you have scraped together from various sources, and updated manually etc.
As Eric pointed out in a previous message, trying to do the "right thing" on the client end for stripping the domain name accurately would mean duplicating the knowledge of that list in the client - something which could change from time to time....and if you get it wrong you potentially miss some hits.
Is there any way that you could automatically publish that information in the surbl itself ? Perhaps a client could retrieve a list of current ccTLD designations and whether they are 2nd level or 3rd level etc, and persistently cache that for a few hours to a day, and refer to that during processing.
That way, as errors in the ccTLD designations come to light, or registrars add new ones etc the changes could be automatically propogated down to the clients...
It's not the sort of thing that would need checking every lookup, probably once a day would be fine, and could be handled in a similar way to how the razor client keeps a note of the last time it downloaded the list of discovery servers, when the time limit expires, the surbl client attempts to download the latest list.
Another good thing would be that using this information at the client end would be optional...
I wonder if it would be worth it though, or maybe more trouble than just doing two queries as now ?
Regards, Simon
On Wednesday, April 21, 2004, 8:41:43 PM, Simon Byrnand wrote:
At the moment it is not entirely foolproof to figure out how many layers there are on a given ccTLD domain name, and at surbl.org in your backend processing system you have a list which you have scraped together from various sources, and updated manually etc.
As Eric pointed out in a previous message, trying to do the "right thing" on the client end for stripping the domain name accurately would mean duplicating the knowledge of that list in the client - something which could change from time to time....and if you get it wrong you potentially miss some hits.
Is there any way that you could automatically publish that information in the surbl itself ? Perhaps a client could retrieve a list of current ccTLD designations and whether they are 2nd level or 3rd level etc, and persistently cache that for a few hours to a day, and refer to that during processing.
That way, as errors in the ccTLD designations come to light, or registrars add new ones etc the changes could be automatically propogated down to the clients...
It's an interesting proposal. We could use a separate zone line tld.surbl.org to get the info out.
Another, possibly better, approach would be for me to use the same ccTLD routines that Eric is borrowing from URIDNSBL on the data engine side. That is another way to get us all in sync assuming that approach used is reasonably sound. It's something I may check out later. Until then the ccTLD list will need to do.
Jeff C.
Hi!
It's not the sort of thing that would need checking every lookup, probably once a day would be fine, and could be handled in a similar way to how the razor client keeps a note of the last time it downloaded the list of discovery servers, when the time limit expires, the surbl client attempts to download the latest list.
Another good thing would be that using this information at the client end would be optional...
I wonder if it would be worth it though, or maybe more trouble than just doing two queries as now ?
Just a idea, perhaps we can, if we give back 127.0.0.8 for example, it could mean lookup again, this is a country multi level. Then we could list them dynamicly. Only the code behind should know how to deal with that. I dont know if thats possible.
Bye, Raymond.
On Wed, Apr 21, 2004 at 08:27:58PM -0700, Jeff Chan wrote:
On Wednesday, April 21, 2004, 7:55:33 PM, Simon Byrnand wrote:
At 14:39 22/04/2004, Eric Kolve wrote:
I can add something that will cache on a per test basis the results from the queries so the above scenario should be knocked down to just 3 queries instead of 120. I have been a little hesitant to cache misses since I could see where a miss could become a hit later on, but since I would only be caching per test this shouldn't be an issue.
You mean 6 queries ? Assuming you still test both 2nd level and 3rd level possibilities seperately, as now.
It sounds like Eric is changing SpamCopURI to test on the number of levels depending on ccTLD membership, which is probably fine, therefore it would do 3 queries.
And as far as caching goes, it shouldn't be a problem, because you just want to avoid doing a whole string of identical dns lookups - only cache identical lookups, and the caching should only last for one run of SA processing one message...(We assume that no new blacklist records appear in the middle of processing a specific message, or that if they do we don't care ;-)
Yes, it's not clear if "per test" means per message, but it would seem so. That too should be fine. I don't think we should worry too much about the boundary condition of the SURBL changing in the middle of message processing, which seems like it would be uncommon. Per-message caching is already a big help.
'per test' literally means per test. So if you run three eval:check_spamcop_uri_rbl tests, then you will have at most three queries for the same domain.
I would cache per message, but there isn't a good place to cache this data. I have access to PerMsgStatus, but its not a good idea to start shoving stuff into that hash as other code depends on its structure.
--eric
Jeff C.
Discuss mailing list Discuss@lists.surbl.org http://lists.surbl.org/mailman/listinfo/discuss
At 15:44 22/04/2004, Eric Kolve wrote:
And as far as caching goes, it shouldn't be a problem, because you just want to avoid doing a whole string of identical dns lookups - only cache identical lookups, and the caching should only last for one run of SA processing one message...(We assume that no new blacklist records
appear in
the middle of processing a specific message, or that if they do we don't care ;-)
Yes, it's not clear if "per test" means per message, but it would seem so. That too should be fine. I don't think we should worry too much about the boundary condition of the SURBL changing in the middle of message processing, which seems like it would be uncommon. Per-message caching is already a big help.
'per test' literally means per test. So if you run three eval:check_spamcop_uri_rbl tests, then you will have at most three queries for the same domain.
I would cache per message, but there isn't a good place to cache this data. I have access to PerMsgStatus, but its not a good idea to start shoving stuff into that hash as other code depends on its structure.
Per test is the right way to do it though isn't it ? So thats fine. The cached results of sc.surbl.org should have no bearing on ws.surbl.org...we're just trying to avoid literally identical dns queries, and spammer.com.sc.surbl.org and spammer.com.ws.surbl.org aren't identical.
But we don't want to see spammer.com.sc.surbl.org being queried 20 times for the same message ;-)
Regards, Simon