> Dr. Lum's point was that predictive policing software merely hides this dynami...

tech_ken · on Oct 3, 2023

I’m not sure I understand your point. If the model is predicting the probability that police will make an arrest in a location then I think it could perform well on rudimentary classifier metrics without working well in the more general sense of resolving crime. If police could make drug arrests in a location, but tend to do it in ZIP codes with low socioeconomic indices then the model will predict more arrests where socioeconomic indices are low. Police acting on this intel will turn up true positives, and the classifier will get high marks. But because you’re not able to assess the false negative rate properly (the volume of crimes that police didn’t make arrests for) you’re unable to holistically evaluate its performance. I guess in that second sense the model isn’t actually performing well, but because it can’t be measured it doesn’t really get monitored.

jncfhnb · on Oct 3, 2023

You would have to go out of your way, imo, to build a model this stupid. If the claim is that all areas have equal arrest potential, then this should be easily detected in the model. If the modelers were so stupid that they failed to account for the presence of police when estimating the rate of/probability of/total quantity of arrests then sure, they’re just stupid people making stupid models. Or intentionally making stupid models.

But it would be very easy to do something like to predict the probability that a cop makes an arrest given they went to each area. And if there’s no difference in the areas, it should not matter how many times they went there. The rates should be the same.

It seems likely that the models were right and that it’s way easier to make drug arrests in these areas, which was kind of baked into the original premise. So it’s not clear why blaming the modeling is an issue here.

The problem is the externalities of the policy. Not some mode overfitting. Like would you blink if I told you the probability of being able to make a drug arrest in a poor area was 20% higher? Probably not. Does that need to mean that you only go to the poor area? No.

tech_ken · on Oct 3, 2023

> The problem is the externalities of the policy. Not some mode overfitting.

Sure I don't disagree with this. I guess the point is not so much that ML models are bad here in the typical sense (although greedy consultants may, and IMO likely are, happy to pawn off shitty models to jurisdictions which don't know any better), but more that the underlying system isn't one where predictive modeling is truly going to be "effective" (although, as another commenter pointed out, there are cases where predictive modeling works fairly well such as pretrial detention risk assessment). The problem as I see it is that model "efficacy" means one thing to the cops and voters ("effective" in the sense of reducing or preventing crime) and another thing to a data science ("effective" in the sense of able to achieve a high F1 score or w/e). These definitions may be correlated, but are not guaranteed to be, and the strength of the correlation is highly dependent on how the model is ultimately used.

enord · on Oct 3, 2023

That would just depend on a whole host of specifics. Incidentally, the same specifics as for regular statistics as ML is also statistics, and is sensitive to experimental design and sampling just the same.