Just an idea, I don't know if this have ever been discussed.
In the course of operating the surbl lists the realtime amount of requests for each listed domain (and each not listed domain as well) and the IP of servers using surbl to do the tests is known.
I don't have the data, but I suppose a spam run in progress should be easy to identify by the high number of requests for the spamvertized domain in a short period of time coming from a large number of geographically diverse mail servers.
Using that data, it should be possible to add an activity bit triggered when activity for the queried domain crosses a predefined threshold (the exact recipe would need extensive tweaking).
If such an activity bit is present, it should be possible to slightly lower the score for the other tests, using it as a 'score booster'. That way, the effect of a false positive, or a site generating so few tests they don't constitute a 'real' spam run would be lower, but detection score for an actively spamvertized site would increase.
also, since most legitimate mailing lists are to recipients in close geographic proximity, the geographic diversity of such lists should be very different when compared to a typical spam run. Such location pattern analysis could also be used (internally) as a warning for possible false positives. One step further, a 'spammy' query pattern on an unlisted domain might signal it should be investigated/listed.
Does it make sense ?