[SURBL-Discuss] Additional phish/fraud list
Jeff Chan
jeffc at surbl.org
Sat Sep 18 12:20:17 CEST 2004
On Saturday, September 18, 2004, 2:38:29 AM, David Hooton wrote:
> On Sat, 18 Sep 2004 00:33:59 -0700, Jeff Chan <jeffc at surbl.org> wrote:
>> Most of the data looks pretty regular, but one difference
>> is that the mailpolice data has some records like these:
> <snip>
>> which we would typically try to reduce to their base (registrar)
>> domains. Reducing would cause some obvious false positives, for
>> example comcast.net, if we did not happen to whitelist it.
> Hmm, this is not great.
>> One solution would be to not reduce. Another would be to discard
>> these longer domains, but it's not too easy to detect which ones
>> to discard. Neither solution is really great, but they're both
>> better than reducing, because of the FPs that would create.
> This is probably the best approach.
Thanks for the feedback! :-)
BTW for anyone who wants to check them out, the slightly
processed list, which would go into PH is at:
http://spamcheck.freeapp.net/mailpolice-fraud.srt
The changes are my standard ones:
1. force to lower case
2. discard records that have other than [a-z0-9\.\-]
(original style domain name restrictions)
Unusually in this case don't try to reduce gtlds to two levels.
Jeff C.
More information about the Discuss
mailing list