URL shortener abuse data - Discuss

21 Dec 2009


      As I've mentioned here before, I write an open source URL shortening
service.  I also operate one publicly since that was the only way I was
ever going to get useful real world data to work with.  To look at the
code though, you'd think I run an anti-abuse system that also shortens
URLs.  So goes the Internet I guess.   While I have a tiny number of
users relative to any URL shortening service you've ever heard of,  a
couple of thousand people have downloaded the software, and I'm aware of
a couple of dozen public installations.  Between all the ones that may
be out there, we might just produce enough abuse data to make it worth
someone's while to use it.
As I'm about to do a lot of work on the software in preparation for a
new release, I'd like to get anyone's thoughts about the following:
1.  Interest: Is anyone interested in using URI abuse data produced by
URL shorteners?
2. Ideal way to supply the data: I'd rather supply it to SURBL or some
other well-run URI BL for inclusion in an existing URI BL service than
operate one for this purpose.  I'm looking to use anything that remotely
resembles standards for everyone's sake, rather than re-invent the wheel.
3. Trust/Reputation:  I've discovered on a couple of occasions that I'd
been using on my mail server abuse data from shoddily run operations,
and frankly this is much much worse than getting some spam.  I don't
intend to be Yet Another Irresponsible BL Operator running an absentee
automated blocklist data source.  Yet, in order to have enough data to
be worth sending somewhere, I'm going to need to accept input from my
users, both of the service, and of the source code.  I plan on doing
something similar to Craigslist whereby people can report abuse from the
web site and a certain number of complaints will disable the
redirection.  As for my source code users, I think I need some kind of
reputation system.  Any thoughts about this would be appreciated.
- Ron