From: Jeff Chan To: spamassassin-users Date: Friday, April 23, 2004, 3:28:49 PM Subject: Goofy domain names
===8<==============Original message text=============== On Friday, April 23, 2004, 7:38:46 AM, Chris Santerre wrote:
This is where BigEvil may start going. I can change mine in 2 secs to use /\d00\dhosting/ but as soon as I do that, it will be removed from be.surbl.org. For obvious reasons they can't use wildcards. All signs point to me changing bigevil over to search for this kind of stuff, and simply add any static ones I have to ws.surbl.org. But will see.
This is where our philosophies clash slightly.
SURBLs just want a list of known spam domains.
SA rulesets with wildcards try to match entire possible/probable classes of domain names based on observing prior types of variation.
Both approaches have their merits.
For my purposes, I'd just prefer to get the domains that have already been found in spam. I acknowledge that that doesn't have the predictive value of the class approach, but it also makes FPs less possible in principle. (Though in reality it's not very likely that any legitimate sites are suddenly going to start using rxmeds1.com, rxmeds2.com, rxmeds3.com, etc.)
Jeff C.
===8<===========End of original message text===========
On Friday, April 23, 2004, 3:37:34 PM, Jeff Chan wrote:
On Friday, April 23, 2004, 7:38:46 AM, Chris Santerre wrote:
This is where BigEvil may start going. I can change mine in 2 secs to use /\d00\dhosting/ but as soon as I do that, it will be removed from be.surbl.org. For obvious reasons they can't use wildcards. All signs point to me changing bigevil over to search for this kind of stuff, and simply add any static ones I have to ws.surbl.org. But will see.
This is where our philosophies clash slightly.
SURBLs just want a list of known spam domains.
SA rulesets with wildcards try to match entire possible/probable classes of domain names based on observing prior types of variation.
Both approaches have their merits.
For my purposes, I'd just prefer to get the domains that have already been found in spam. I acknowledge that that doesn't have the predictive value of the class approach, but it also makes FPs less possible in principle. (Though in reality it's not very likely that any legitimate sites are suddenly going to start using rxmeds1.com, rxmeds2.com, rxmeds3.com, etc.)
I should amend this: SURBLs don't care what domains are in them. be.surbl.org handles most of the wildcarded domains from BigEvil.cf and MidEvil.cf just fine.
It's only the more complex ones with fancier patterns than simple alternation that are not expanded into separate domain names by expand_regex.pl.
Still, in cases where the resulting patterns are too large to expand into all possible domains, I'd prefer to get a list of the actual reported ones, for use in be.surbl.org, instead of discarding them.
Jeff C.
Jeff Chan wrote:
On Friday, April 23, 2004, 3:37:34 PM, Jeff Chan wrote:
On Friday, April 23, 2004, 7:38:46 AM, Chris Santerre wrote:
This is where BigEvil may start going. I can change mine in 2 secs to use /\d00\dhosting/ but as soon as I do that, it will be removed from be.surbl.org. For obvious reasons they can't use wildcards. All signs point to me changing bigevil over to search for this kind of stuff, and simply add any static ones I have to ws.surbl.org. But will see.
It's only the more complex ones with fancier patterns than simple alternation that are not expanded into separate domain names by expand_regex.pl.
Why use expand_regex.pl at all? Instead of taking "/\d00\dhosting/", and expand it into 100 potential domain names, why not simply accept a query like:
9001hosting.com.be.surbl.org
And, internally, do the regex "/\d00\dhosting/" on it? (and all BigEvil regexes, for that matter). And return 127.0.0.2 on a match.
Ultimately, using expand_regex.pl, imho, is utterly self-defeating -- without stretching the imagination too much, even. If only I change the above regex to:
/\d+00\d+hosting/
We're already looking at trillions of permutations!
It seems to me be.surbl.org should not do an internal database query on single domain names, but instead do a BigEvil regex (like SA does). At least, that is what I thought it would do; and how I had hoped it would have been.
Cheers,
- Mark
On Sat, 24 Apr 2004, Mark wrote:
Why use expand_regex.pl at all? Instead of taking "/\d00\dhosting/", and expand it into 100 potential domain names, why not simply accept a query like:
9001hosting.com.be.surbl.org
And, internally, do the regex "/\d00\dhosting/" on it? (and all BigEvil regexes, for that matter). And return 127.0.0.2 on a match.
Ultimately, using expand_regex.pl, imho, is utterly self-defeating -- without stretching the imagination too much, even. If only I change the above regex to:
/\d+00\d+hosting/
We're already looking at trillions of permutations!
It seems to me be.surbl.org should not do an internal database query on single domain names, but instead do a BigEvil regex (like SA does). At least, that is what I thought it would do; and how I had hoped it would have been.
Because the DNS system is designed to work with static lists of data, not regex patterns.
Theoretically you could design a DNS server front-end on a regex engine that would take fixed DNS queries and do regex matches. However the DNS zone-transfer mechanism is designed to deal with fixed data-sets, not regexes. Again you could take the standard DNS TXT record and overload them with special 'magic' tags to indicate that their data are regex patterns and then design a special DNS server front-end that would take those special zones and convert them into food for the regex engine back-end, but that would take special coding... Anyway you cut it, you've already blown out the possibility of using the standard DNS packages and moved into the custom coding land.
Another detail, is that the DNS system is heavily based upon the concept of cacheing. Currently there's no meaningful way to cache a regex, only the results of every lookup permutation. That would impose an unrealistic demand on the down-stream DNS servers (remember those trillions of permutations!), or demand a query all the way back to the authoritative servers on every lookup (IE do not permit any caching) which would be a killer for them in bandwidth and CPU load.
Another detail, you're adding to the CPU load on the "regex-ifed" DNS servers. Doing a regex match is more CPU intensive than a database lookup on a fixed data set. When you have the potential of thousands or millions of clients beating upon a handfull of DNS servers, the CPU load considerations become critical.
Might as well stick with BigEvil.cf and the rsync distribution scripts that have been developed by various people in SA land. (IE use already tried and tested means for those that want this kind of functionality).
David B Funk wrote:
On Sat, 24 Apr 2004, Mark wrote:
Why use expand_regex.pl at all? Instead of taking "/\d00\dhosting/", and expand it into 100 potential domain names, why not simply accept a query like:
9001hosting.com.be.surbl.org
And, internally, do the regex "/\d00\dhosting/" on it? (and all BigEvil regexes, for that matter). And return 127.0.0.2 on a match.
Ultimately, using expand_regex.pl, imho, is utterly self-defeating -- without stretching the imagination too much, even. If only I change the above regex to:
/\d+00\d+hosting/
We're already looking at trillions of permutations!
It seems to me be.surbl.org should not do an internal database query on single domain names, but instead do a BigEvil regex (like SA does). At least, that is what I thought it would do; and how I had hoped it would have been.
Because the DNS system is designed to work with static lists of data, not regex patterns.
Theoretically you could design a DNS server front-end on a regex engine that would take fixed DNS queries and do regex matches. However the DNS zone-transfer mechanism is designed to deal with fixed data-sets, not regexes.
DNS zone-transfers would not be affected. It would still be the same, static data-set being pushed back and forth. In fact, the be.surbl.org DNS server would not contain a regular DNS database at all (!), as it merely does regexes on incoming queries! If anything, you'd have to transer BigEvil regex rules, really.
Another detail, is that the DNS system is heavily based upon the concept of cacheing. Currently there's no meaningful way to cache a regex, only the results of every lookup permutation. That would impose an unrealistic demand on the down-stream DNS servers (remember those trillions of permutations!),
Not at the client-side. When I were to query "9001hosting.com.be.surbl.org", there would be no overhead whatsoever, cache-wise, for a front-end DNS system that does a behind-the-scenes regex on the query. It would still reply with a single response. It would, to the client, appear no different then if the DNS server had retrieved the domain name from database.
The caching DNS server at be.surbl.org itself might grow big, though, as it potentially holds those trillions of those domain names. But it would, as you say, really only cache the result of each looked-up permutation. And no more. And, in this, it would be no different from a regular DNS server.
Another detail, you're adding to the CPU load on the "regex-ifed" DNS servers. Doing a regex match is more CPU intensive than a database lookup on a fixed data set. When you have the potential of thousands or millions of clients beating upon a handfull of DNS servers, the CPU load considerations become critical.
This is true. And it may indeed be quite a problem.
Might as well stick with BigEvil.cf
My point exactly; without using real regexes in be.surbl.org, dropping BigEvil.cf, for now, is ill-advised, really.
Cheers!
- Mark
System Administrator Asarian-host.org
--- "If you were supposed to understand it, we wouldn't call it code." - FedEx