[SURBL-Discuss] Additional phish/fraud list

Jeff Chan jeffc at surbl.org
Sat Sep 18 12:20:17 CEST 2004

On Saturday, September 18, 2004, 2:38:29 AM, David Hooton wrote:
> On Sat, 18 Sep 2004 00:33:59 -0700, Jeff Chan <jeffc at surbl.org> wrote:

>> Most of the data looks pretty regular, but one difference
>> is that the mailpolice data has some records like these:
> <snip>
>> which we would typically try to reduce to their base (registrar)
>> domains.  Reducing would cause some obvious false positives, for
>> example comcast.net, if we did not happen to whitelist it.

> Hmm, this is not great.

>> One solution would be to not reduce.  Another would be to discard
>> these longer domains, but it's not too easy to detect which ones
>> to discard.  Neither solution is really great, but they're both
>> better than reducing, because of the FPs that would create.

> This is probably the best approach.

Thanks for the feedback!  :-)

BTW for anyone who wants to check them out, the slightly
processed list, which would go into PH is at:


The changes are my standard ones:

1. force to lower case
2. discard records that have other than [a-z0-9\.\-]
(original style domain name restrictions)

Unusually in this case don't try to reduce gtlds to two levels.

Jeff C.

More information about the Discuss mailing list