On Friday, September 17, 2004, 3:10:48 PM, Jay Swackhamer wrote:
Raymond Dijkxhoorn wrote:
Are those zones available for rsync (rbldnsd format) ? Can test it on my own cluster also then right now.
Nice to see more lists starting off.
The fraud list has been operating publically since around March.
OK taking a look at the fraud.rhs.mailpolice.com data, there's not too much overlap with the MailSecurity phishing data which we're currently using in PH in muli.surbl.org.
The former has about 260 records, and the latter has about 400 records, and the overlap is around 25 records. So adding in the mailpolice fraud data would grow PH by about 240 new records.
Most of the data looks pretty regular, but one difference is that the mailpolice data has some records like these:
1380781-usd10.e-gold.com accountassistant.z6.com.br cgi4-awconfirmisapidll-38u3428.cjb.net citibank.com.userset.net dfko49b.mail333.com halifax-online.co.uk.userset.net homelink.form.accepted.cc homelink.form.accepted.pula.cc paypalzzzz.tripod.com pcp09296036pcs.arlngt01.va.comcast.net proba21.netfirms.com rranostand.home.ro xyxca.home.ro your-tradetool.com zyxell9.clawz.com [not a complete list of these longer domains]
which we would typically try to reduce to their base (registrar) domains. Reducing would cause some obvious false positives, for example comcast.net, if we did not happen to whitelist it.
Some of these also don't make sense. e-gold.com is legitimate, and www.e-gold.com and 1380781-usd10.e-gold.com resolve to the same IP address. Why would e-gold phish themselves or allow a phisher to be hosted on their main web server?
One solution would be to not reduce. Another would be to discard these longer domains, but it's not too easy to detect which ones to discard. Neither solution is really great, but they're both better than reducing, because of the FPs that would create.
The un-reduced longer domains basically won't be matched by most code using SURBLs, because the client-side code usually tries to reduce to base domains. So if we leave the longer domains in the data, aside from making the data a little larger, it doesn't have too much downside. On the other hand since multi.surbl.org is a logical "or" of all domains, any extra records in any list going into multi.surbl.org makes multi unnecessarily longer. But the number of these longer domains is probably minor in the larger picture: a dozen or two.
Also Jay: example.tld is on the list. That doesn't resolve and probably isn't useful for fraud or phishing so you may want to consider removing it. ;-)
It would be nice to figure out these issues before adding the mailpolice fraud data into PH.
Comments?
Jeff C.