quickinspirations.com
Thoughts on this guy? 48 NANAS hits. Listed in dnsbl.njabl.org
Didn't we already classify this? I wonder if we can get a crossref lookup of the global whitelist, so we can see what has already been whitelisted.
I belive this is a spammer in sheeps clothing. But it fools people into signing up for a stupid inspiration newsletter. But the subjects of the newsletter are stuff like:
Subject: Get a Credit Report Instantly!
Then a inspiration quote, then a big ad. I think we may see more of this in the future.
I think it is a UC candidate. Do we have enough listed in UC for testing yet?
Chris Santerre System Admin and SARE Ninja http://www.rulesemporium.com http://www.surbl.org 'It is not the strongest of the species that survives, not the most intelligent, but the one most responsive to change.' Charles Darwin
FP: smithbarney.com
This is a part of CitiBank, a MAJOR financial institution. Shows up in the "SC" part of SURBL.
I don't know if they spam or not, but this definitely should be white-listed ASAP. I'm embarrassed to tell a client of mine that THIS was the domain that caused some of his legitimate mail to get blocked.
I searched for smithbarney.com on NANAS and, at a glance, I think most of these NANAS hits are phishing e-mails.
Rob McEwen
FP: smithbarney.com
(followup comments)
I was trying to think... how did this one get on there? It seems like it just barely missed the various institutional-based whitelists.
I did a search of this on alexa.com and their site is ranked just inside the top 20,000 web sites.
SEE: http://www.alexa.com/data/details/?url=smithbarney.com
Then I thought, wouldn't it be interesting to run the top 20,000 Alexa sites against SURBL... double-check whichever of these are currently getting "caught" by SURBL. Remove any which should be removed, (I'm sure at least a few would remain in SURBL??). Then whitelist all of the 20k that haven't been specifically determined as needing to remain in SURBL.
Does anyone know, is there a quick way to get this list out of alexa?
Rob McEwen
Rob McEwen wrote:
FP: smithbarney.com
(followup comments)
I was trying to think... how did this one get on there? It seems like it just barely missed the various institutional-based whitelists.
I did a search of this on alexa.com and their site is ranked just inside the top 20,000 web sites.
SEE: http://www.alexa.com/data/details/?url=smithbarney.com
Then I thought, wouldn't it be interesting to run the top 20,000 Alexa sites against SURBL... double-check whichever of these are currently getting "caught" by SURBL. Remove any which should be removed, (I'm sure at least a few would remain in SURBL??). Then whitelist all of the 20k that haven't been specifically determined as needing to remain in SURBL.
Guys...... SURBL is used by the world, not only the US
Alexa.com doesn't have the best of reputations on this side of the pond.
Their Privacy Policy is dubious: -------------- ALEXA'S TOOLBAR SERVICE COLLECTS AND STORES INFORMATION ABOUT THE WEB PAGES YOU VIEW, THE DATA YOU ENTER IN ONLINE FORMS AND SEARCH FIELDS, AND, WITH VERSIONS 5.0 AND HIGHER, THE PRODUCTS YOU PURCHASE ONLINE WHILE USING THE TOOLBAR SERVICE. ALTHOUGH ALEXA DOES NOT ATTEMPT TO ANALYZE WEB USAGE DATA TO DETERMINE THE IDENTITY OF ANY ALEXA USER, SOME INFORMATION COLLECTED BY THE TOOLBAR SERVICE IS PERSONALLY IDENTIFIABLE. ALEXA AGGREGATES AND ANALYZES THE INFORMATION IT COLLECTS TO IMPROVE ITS SERVICE AND TO PREPARE REPORTS ABOUT AGGREGATE WEB USAGE AND SHOPPING HABITS. --------------- more @ http://pages.alexa.com/help/privacy.html
Pls don't force whitelisting more than necessary, or put these domains in your site's whitelist but spare us whitelisting their associates as much as possible
Alex
Alex said:
Pls don't force whitelisting more than necessary, or put these domains in your site's whitelist but spare us whitelisting their associates as much as possible
Alex,
I think what you are saying is beside the point because, regardless of Alexa's business practices, the fact remains that:
(1) There is going to be a very, very, very strong correlation of Alexa's rankings to what sites people are actually visiting most often. (Is there a more accurate list out there, anywhere?)
(2) I think that the top 20,000 alexa sites are going to have a very high probability of being domains which get mentioned in hams fairly often.
(3) This test I propose will probably find very, very few of hits on SURBL of sites in the first place. And, as I said, not all of this should be automatically removed from SURBL. I specifically said that these should be **double checked**... NOT automatically removed. You talk as if I suggested that these be automatically removed. I just said to double-check these.
(4) If we only find one or two domains which really should be removed, this could be of potentially great benefit toward reducing FPs. Expecially since smithbarney.com, an obvious candidate for whitelisting, was one the least-frequented sites on this list of 20,000.
If this results in significant reduction of FPs, then perhaps we should do it again with rankings alexa.com rankings 20k through 50k??
Rob McEwen
At 15:51 2004-09-30 -0400, Rob McEwen wrote:
I think what you are saying is beside the point because, regardless of Alexa's business practices, the fact remains that:
(1) There is going to be a very, very, very strong correlation of Alexa's rankings to what sites people are actually visiting most often. (Is there a more accurate list out there, anywhere?)
The Alexa list is based on the historical statistical multiple-domain-consolidated behaviour of a relatively small sampling of people gullible enough to install the Alexa toolbar even though it's classified as spyware by major antispyware programs. Those people are not representative in general, and in particular they are more likely to be fooled into all sorts of spam scams. And the actual ranking algorithms are beyond dubious.
(2) I think that the top 20,000 alexa sites are going to have a very high probability of being domains which get mentioned in hams fairly often.
What is that assumption based on? If you said top 50 I could possibly buy that assumption, but below that level Alexa is very unreliable.
Visit www.rockwelldatacorp.com with a browser with Alexa toolbar installed. Should rockwelldatacorp.com be whitelisted? Should domainsponsor.com be whitelisted? This is at position 374 in Alexas traffic ranking...
Patrik
Patrick,
You make some good points!
But I still think that my idea is valid because I suspect that it may help us to find one or two more egregious FPs (like smithbarney.com). Would that not make this idea very worthwhile? Also, do you know of a better traffic ranking list?
Rob McEwen
At 17:01 2004-09-30 -0400, Rob McEwen wrote:
Patrick,
You make some good points!
But I still think that my idea is valid because I suspect that it may help us to find one or two more egregious FPs (like smithbarney.com). Would that not make this idea very worthwhile?
I don't think that more extensive whitelistings is the answer to current FP problems. Less "didn't even check nanas/sbl/the actual site" listings is a much better way to handle that problem, at least for WS. We can not solve the FP problem by using even more whitelistings. It will just create a new problem of FWLs - False White Listings.
Also, do you know of a better traffic ranking list?
Yes I do, but as the level of traffic doesn't correspond to the level of non-spamminess, I really don't think it's relevant in this context.
Patrik
On Thursday, September 30, 2004, 2:18:28 PM, Patrik Nilsson wrote:
At 17:01 2004-09-30 -0400, Rob McEwen wrote:
But I still think that my idea is valid because I suspect that it may help us to find one or two more egregious FPs (like smithbarney.com). Would that not make this idea very worthwhile?
I don't think that more extensive whitelistings is the answer to current FP problems. Less "didn't even check nanas/sbl/the actual site" listings is a much better way to handle that problem, at least for WS. We can not solve the FP problem by using even more whitelistings. It will just create a new problem of FWLs - False White Listings.
I disagree somewhat. If we had a "list of every legitimate domain", I'd probably want to use it. However no such list exists. ;-)
Also, do you know of a better traffic ranking list?
Yes I do, but as the level of traffic doesn't correspond to the level of non-spamminess, I really don't think it's relevant in this context.
Level of traffic probably has some relationship to mentions in hams.
What other traffic ranking systems are you aware of?
Jeff C. -- "If it appears in hams, then don't list it."
Patrick:
I understand that you don't want domains to be whitelisted solely on the basis of their web site traffic if they really shouldn't be whitelisted. I've addressed this particular concern at least twice now, and it still seems as though you didn't actually read that part of my posts? Also, if what I'm proposing is such a waste of time, then I rest my case on the simple and indisputable fact that smithbarney.com WOULD have been spotted and whitelisted earlier if my idea had already been implemented. (But, as Jeff said, this is mute point given that this data is not readily available).
Your suggestions for improvements are very wise and there is no reason why we can't purse BOTH angles. This is not an either/or situation.
Rob McEwen
At 20:38 2004-09-30 -0400, Rob McEwen wrote:
I understand that you don't want domains to be whitelisted solely on the basis of their web site traffic if they really shouldn't be whitelisted. I've addressed this particular concern at least twice now, and it still seems as though you didn't actually read that part of my posts?
Yes I did.
I just have a major problem with this approach in general.
Extensive whitelistings will never solve the real problem - too many FPs in the input. Using traffic/size/etc-data as input for decisions on which sites to whitelist solves even less. It might minimize some of the more obvious FPs, but I'm actually more worried about the FPs that are not so obvious - the smaller sites and companies that are not on any "largest/most visited/etc" lists and don't get noticed immediately, that will linger on as FPs until someone who happens to hand check a message recognize the domain as legit. The real big ones will show up and be whitelisted quite quickly anyway. But they are just a small percentage of the actual FPs that I encounter. Most are smaller or non-US sites that didn't immediately ring a bell and will still not ring a bell regardless of how many "big company/lots of traffic/etc" lists we use as whitelisting sources.
Arguing that "if it eliminates FP X, Y days earlier than it got whitelisted anyway, it's worth the effort" doesn't cut it. What gets done is limited by resources and focus. If efforts and focus go into another whitelist source, less efforts will go into something else that might be more worthwhile. Like, in my opinion, making the initial listings trackable.
I also have a particular problem or two with using Alexa.
Alexa produce dubious data using dubious methods. I think associating with them is a bad idea, and using anything below their top 50 as an indication that a domain is legit and non-spammy will produce a new set of bad data instead of cleaning up the initial one.
Patrik
I'm trying to find a good way to get the alexa.com data directly off of their web site. Their sub-categories say they can be listed by popularity, but the give some very weird mixes of rankings.
Another interesting thing is to try searching Google.com using the following:
site:alexa.com "traffic rank for" "related info"
and
site:alexa.com "traffic rank for" "related info" business
(or add category or other keyword to the end)
The actual domain is embedded in the page titles. But, without some automation, it would still be a lot of work to screen-scrape harvest these and the results are still rather randomly ordered.
ANOTHER GOOD RESOURCE:
http://www.port80software.com/surveys/top1000webservers/
This was a survey of large company's web servers. If you look at the source for these pages, it would be easy to parse the domains from these mere 12 pages. But this list might not be any better then the lists you have already checked against in the past.
I'm still no where near to a list as large and valuable as the Alexa top 20,000 would have been. I'm still looking :)
Rob McEwen
On Thursday, September 30, 2004, 5:48:44 PM, Rob McEwen wrote:
This was a survey of large company's web servers. If you look at the source for these pages, it would be easy to parse the domains from these mere 12 pages. But this list might not be any better then the lists you have already checked against in the past.
Hi Rob, I can scrape those. It looks like the web sites of the Fortune 1000. Probably already have most of them...
Jeff C. -- "If it appears in hams, then don't list it."
On Thursday, September 30, 2004, 5:58:45 PM, Jeff Chan wrote:
On Thursday, September 30, 2004, 5:48:44 PM, Rob McEwen wrote:
This was a survey of large company's web servers. If you look at the source for these pages, it would be easy to parse the domains from these mere 12 pages. But this list might not be any better then the lists you have already checked against in the past.
I can scrape those. It looks like the web sites of the Fortune 1000. Probably already have most of them...
OK I grabbed the Fortune 1000 web sites from that site, and as expected, we already have all of them. There are 997 after removing a duplicate and a couple with no sites:
http://spamcheck.freeapp.net/whitelists/fortune1000.srt
Jeff C. -- "If it appears in hams, then don't list it."
Regarding looking for these lists... there is one big caveat:
SEE HERE: "Ads better trafficked than major Web sites" http://news.com.com/2100-1023-268516.html?legacy=cnet
As I traverse through some lists, I'm finding this to be true. This is one reason why alexa's data is probably better, since it focuses on the main address that people actually go to in their browser. Some of these other services which just measure raw ISP requests and these get a lot of noise.
Any suggestions from anyone would be helpful.
I know, I'm obsessed... but you would be too if your client just had mail blocked because it contained a domain name from such a well know business. I simply want to find a way to **minimize** the chances of this happening again.
Rob McEwen
On Thursday, September 30, 2004, 12:51:32 PM, Rob McEwen wrote:
(1) There is going to be a very, very, very strong correlation of Alexa's rankings to what sites people are actually visiting most often. (Is there a more accurate list out there, anywhere?)
(2) I think that the top 20,000 alexa sites are going to have a very high probability of being domains which get mentioned in hams fairly often.
(3) This test I propose will probably find very, very few of hits on SURBL of sites in the first place. And, as I said, not all of this should be automatically removed from SURBL. I specifically said that these should be **double checked**... NOT automatically removed. You talk as if I suggested that these be automatically removed. I just said to double-check these.
(4) If we only find one or two domains which really should be removed, this could be of potentially great benefit toward reducing FPs. Expecially since smithbarney.com, an obvious candidate for whitelisting, was one the least-frequented sites on this list of 20,000.
If this results in significant reduction of FPs, then perhaps we should do it again with rankings alexa.com rankings 20k through 50k??
I agree with your theory that there is probably a strong correlation between commonly visited sites and those mentioned in hams.
However the point is moot if they won't give us a snapshot of the data. The will sell a feed of the data for money, but that's less interesting.
Jeff C. -- "If it appears in hams, then don't list it."
On Thursday, September 30, 2004, 12:03:21 PM, Rob McEwen wrote:
FP: smithbarney.com
(followup comments)
I was trying to think... how did this one get on there? It seems like it just barely missed the various institutional-based whitelists.
I did a search of this on alexa.com and their site is ranked just inside the top 20,000 web sites.
Then I thought, wouldn't it be interesting to run the top 20,000 Alexa sites against SURBL... double-check whichever of these are currently getting "caught" by SURBL. Remove any which should be removed, (I'm sure at least a few would remain in SURBL??). Then whitelist all of the 20k that haven't been specifically determined as needing to remain in SURBL.
Does anyone know, is there a quick way to get this list out of alexa?
Rob McEwen
We are quietly using the Alexa 500 since it's published. At the suggestion of other spam fighters, I wrote Alexa for a larger list but never heard back from them.
Jeff C. -- "If it appears in hams, then don't list it."
FP: smithbarney.com
Domain Name: SMITHBARNEY.COM ... Name Server: NS2.NSROOT2.COM Name Server: NS1.NSROOT1.COM ... Creation Date: 18-nov-1994
Ten years old and, of course, no SBL hits on the name servers. The age alone should virtually guarantee it from being excluded.
That's one nice thing about phishing URLs: The ham and the spam domains tend to neatly fall into two categories, (very) old and very new (<7 days).
Joe
On Thursday, September 30, 2004, 1:23:02 PM, Joe Wein wrote:
FP: smithbarney.com
Domain Name: SMITHBARNEY.COM
... Name Server: NS2.NSROOT2.COM Name Server: NS1.NSROOT1.COM ... Creation Date: 18-nov-1994
Ten years old and, of course, no SBL hits on the name servers. The age alone should virtually guarantee it from being excluded.
That's one nice thing about phishing URLs: The ham and the spam domains tend to neatly fall into two categories, (very) old and very new (<7 days).
Joe
I've whitelisted: smithbarney.com .
Thanks for the research Rob, Chris and Joe.
BTW, please feel free to whitelist these obvious ones immediately if I'm not around.
(Smith Barney is obvious to us U.S. folks who have been bombarded with their (legitimate) financial services advertising for like 30 years. Some of our companies have probably used them also.)
Jeff C. -- "If it appears in hams, then don't list it."
At 09:25 2004-09-30 -0700, Bret Miller wrote:
quickinspirations.com
When this one came up here, every person who received it classified it as spam when I asked and so it remains that in my mind.
And I still haven't seen any response actually arguing a real reason why quickinspirations.com should be whitelisted.
We're not just whitelisting domains because someone, who doesn't even bother to argue why, asks us to, do we?
"This is reported as spam, looks like spam and smells like spam, but we will whitelist it just because it might be caught by other antispam systems anyway" isn't a very convincing argument.
Patrik
On Thu, 30 Sep 2004 09:25:25 -0700, Bret Miller bret.miller@wcg.org wrote:
quickinspirations.com
When this one came up here, every person who received it classified it as spam when I asked and so it remains that in my mind.
Same for us.