'Predictive Policing' Company Uses Bad Stats, Contractually-Obligated Shills To Tout Unproven 'Successes'

from the zoom-in-on-questionable-stats-and...-ENHANCE dept

We’re not quite at the stage where pre-crime divisions are being formed by forward-looking law enforcement agencies. Not yet. But we’re on our way.

SF Weekly has a very thorough report on PredPol, another entry in the “predictive policing” field. According to its website, PredPol’s software (“developed by a team of PhD mathematicians and social scientists” from a variety of California universities) algorithmically determines where crime is likely to occur and cranks out a map highlighting 500′ x 500′ “boxes” that are possible criminal hotspots — places officers should patrol more heavily.

PredPol is currently being deployed by a number of law enforcement agencies worldwide.

The company has sold its proprietary software here and abroad, from Kent County in England to Seattle, Wash., and here in the Bay Area to cities including Richmond, Los Gatos, Morgan Hill, and Santa Cruz.

San Francisco’s PD, despite having an early invitation to test drive the software, has yet to sign a contract with PredPol. Two things are preventing this from moving forward.

From the beginning, the effectiveness of PredPol had been a sticking point in the negotiations.

SFPD’s [CIO Susan] Merritt was skeptical. In a series of e-mails from July 2012 to August 2013, Fowler laid out the technical specifications for the software and the types of crimes PredPol claims to predict. “The crimes we predict are burglary [residential, commercial, auto], auto theft, theft, robbery, assault, battery, and drug crime,” Fowler wrote on July 23, 2012. “This goes significantly beyond your current … mapping tools,” he added. Affixed to [Donnie] Fowler’s e-mail signature was the claim that PredPol’s predictions are “twice as accurate as those made by vet cops…”

Merritt pressed Fowler about whether the program could handle violent crime. “Homicide is a priority in the department — and if it is not there it would just beg the question why not,” she wrote. Fowler admitted at the time that PredPol wasn’t predicting homicides and gun violence.

The skepticism is well-deserved. For one, PredPol simply hasn’t been around long enough to truly compare post-PredPol crime stats with pre-PredPol crime stats. It was still being tested in Santa Cruz and Los Angeles as of fall 2011, and has only been live in certain cities since the beginning of 2012. But that hasn’t stopped it from using mostly worthless year-to-year local crime data as “proof” of its crime fighting power.

A page on its site titled audaciously “Proven Results,” PredPol uses a couple of nearly context-free charts to tout its “success.”

This chart, titled “Proven Accuracy,” contains nothing to indicate what the divisions on the y axis represent, and while it touts its accuracy in the title, the sidebar says only this:

Predicted twice as much crime as experienced crime analysts in 6 month randomized controlled trials.

With nothing specified on the axis, PredPol could be simply outperforming analysts with a 20%-10% “accuracy.” And, as the lines proceed down the x axis, the 2-to-1 ratio shrinks, suggesting the more predictions PredPol makes, the less accurate it is. By the time the lines exit the chart, we’re looking at only a 9-6 [whatever] PredPol “advantage.” (Not only that, but the “2x” indicated is actually 3x, at least as far as I can tell from the unlabeled y axis.)

The second chart is nearly as bad.

Here it uses crime stats (oddly, “Crimes per Day”) from the Foothills Division of Los Angeles to claim a “13% reduction” in crime using year-to-year data. Looking at the chart shows that the $50,000 (and up) system reduced crime by one (1) crime per day (approximately) over that time period. Not only that, but the data only covers up to May 2012.

Crime stats published by the Foothills Division tell a different story. While PredPol claims this reduction is a success, additional statistics show the software has had a negligible effect on crime rates. At this point, according to PredPol’s own chart, the software has been running since late 2011, so any stats for 2012 and 2013 would be relevant. Some stats will have to be ignored as PredPol has only specialized in certain crimes since its beginning (“The crimes we predict are burglary [residential, commercial, auto], auto theft, theft, robbery, assault, battery, and drug crime…”)

Aggravated assault has seen a pretty steep drop, going from 310 in 2011 to 276 in 2012 and, finally, 242 year-to-date, an overall drop of 22% since PredPol’s addition to the force.

Other stats aren’t quite as cheery. Burglaries have dropped from 668 to 544 over that same time period (down 19%), but only viewing the two-year comparison ignores the fact that numbers have climbed since 2012, when only 494 burglaries occurred, a 10% increase over last year.

From 2011 to 2013, auto theft dropped all of 2%, from 712 to 700, a negligible difference. Unfortunately for PredPol, this is also a 10% increase over 2012 (634). Burglary/theft from motor vehicles has risen 5% over that time period (with a small dip in 2012). “Personal/other Theft” has declined 10% over two years and 16% in the last year.

While there are a couple of improvements, much of what’s here (increases and decreases) can be chalked up to normal statistical fluctuations. It’s not enough to completely rule out PredPol’s usefulness as a crime predictor, but it’s certainly not enough to be declared a “proven result” on its website.

Along with L.A.’s Foothill Division, Santa Cruz (where the pilot program took place) is also repeatedly highlighted on PredPol’s “Press” page. And once again, there’s nothing that conclusively shows PredPol’s worth as a predictive tool.

2012 to 2011 comparisons show an increase in criminal activity year-to-year (raw total: 697, up from 580 — a 20% jump), which would have occurred in the second year of PredPol’s deployment. There has been a drastic reduction in crime so far in 2013, but attributing that drop to PredPol also means attributing the 20% jump. A broader look at total crime statistics for the entire Santa Cruz area shows crime rates have fluctuated (sometimes greatly) with regularity over the past decade, suggesting past “solutions” have failed to produce predictable results. PredPol is presenting itself as just such a solution, but the data — even data pulled from the PDs it touts on its website — fails to bear that out.

As with any algorithm, more time in service means more data, and more data should mean better results. Optimistically, the recent decline in crime in Santa Cruz could be seen as an indicator that PredPol’s software is improving. But PredPol shows no such confidence in its ability to curb criminal activity. Sure, it presents itself on its public-facing website as an indispensable, cutting edge addition to any law enforcement agency’s set of crime-fighting tools, but the way it contractually obligates its customers to toot its horn for it (rather than wait for evidence of actual success) tells a completely different story.

PredPol has required police departments that sign on to refer the company to other law enforcement agencies, and to appear in flashy press conferences, endorsing the software as a crime-reducer — despite the fact that its effectiveness hasn’t yet been proven.

Turning customers into marketers via contractual obligations isn’t the act of a confident company. Even PredPol’s own marketing veers into some shady gray areas.

PredPol distributes news articles about predictive policing’s supposed success in L.A. to dozens of other police departments, implying that the company’s software has been purchased and deployed by the LAPD.

Statements from the LAPD say the opposite.

[I]t isn’t clear exactly whose software LAPD has been using. PredPol’s name does not appear anywhere in L.A.’s predictive policing records, though LAPD personnel say they are using PredPol’s software, and Malinowski’s contact information has appeared in PredPol’s sales literature distributed to other cities. In response to a public records request for contracts between L.A. and PredPol, the LAPD says no such agreements exist.

But this pitch, as shady as it is, still works.

PredPol gave the mayor and city council of Columbia, S.C. — Fowler’s hometown — a “confidential” briefing packet assembled by PredPol’s Brantingham. Inside were slides and graphs illustrating L.A.’s supposedly successful use of predictive policing to reduce crime.

This bit of marketing smoke netted two new contracts for PredPol — which led to even more obligatory PR work.

Swayed by the same claims, the city of Alhambra, just northeast of Los Angeles, purchased PredPol’s software in 2012 for $27,500. The contract between Alhambra and PredPol includes numerous obligations requiring Alhambra to carry out marketing and promotion on PredPol’s behalf. Alhambra’s police and public officials must “provide testimonials, as requested by PredPol,” and “provide referrals and facilitate introductions to other agencies who can utilize the PredPol tool.” And that’s just for starters.

Under the terms of the contract, Alhambra must also “host visitors from other agencies regarding PredPol,” and even “engage in joint/integrated marketing,” which PredPol then spells out in a detailed list of obligations that includes joint press conferences, training materials, web marketing, trade shows, conferences, and speaking engagements.

In addition to feeling used by PredPol’s marketing demands, agencies using PredPol’s software have also expressed concerns about how its reliance on “boxes” is diverting resources away from other needed areas — shifting too much focus onto wherever the algorithm decides today’s hot spots are. As much as some agencies may value the addition of predictive software to their arsenals, the worry remains that the “boxes” will receive an outsized amount of policing while the larger, unmarked areas go underserved.

While PredPol is nowhere near the point it can start targeting individuals before they commit crimes, the fact that a proprietary (and unproven) algorithm has the power (with the right marketing) to reroute police patrols is worrying, especially when no hard evidence has been presented that indicates there’s anything to PredPol’s claims. What’s happening here doesn’t necessarily infringe on anyone’s rights, but when combined with other questionable tactics (say, stop and frisk), it could result in pockets of abuse directly stemming from PredPol’s algorithmic “boxes.”

At best, it’s a marginally useful addition to the many tools PDs already deploy. But it’s certainly not the game changer PredPol’s presenting it as and its habits of presenting selective, limited data as proof and utilizing contracts that turn customers into unwilling shills isn’t exactly a confidence booster.

Filed Under: , , ,
Companies: predpol

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “'Predictive Policing' Company Uses Bad Stats, Contractually-Obligated Shills To Tout Unproven 'Successes'”

Subscribe: RSS Leave a comment
Anonymous Coward says:

you have got to be kiddin’ me! do people, supposedly of high intelligence and earning substantial salaries because, mainly, of their high intelligence in the field they have chosen to work, really believe this crap? they could pay me if they like and i’d probably give just as accurate predictions by looking into a crystal ball, borrowed from a local fairground!! like most things from the UK nowadays, total worthless crap!!

out_of_the_blue says:

"Behavior prediction software company Behavio now part of Google"


Can Google Search Behavior Predict Political Behavior?


Just to remind that your “friend” is creeping all over this.

aldestrawk says:

Re: Fuzzy math

I am not following your logic. One square mile can potentially contain, roughly, 100 prediction boxes (500ft x 500ft). At 500 square miles, LA has a total of 50,000 potential prediction boxes. If all are used that will indeed be assured to account for 100% of crime. The whole point of predictive policing, and hot spot policing in general, is that a small portion of the total area is selected for additional periodic patrols.

David Good (profile) says:


Couldn’t they take old statistics (say, ten years worth), plug it into the system and see if it successfully predicts the following year? Like what if they plugged in crime data from 2000 – 2010, then checked to see if it successfully figured out what 2011, 2012 and 2013 were like?

Wouldn’t that be adequate to see if the system worked?

Anonymous Coward says:

In addition to the reasons given above...

…we may immediately dismiss this software on inspection. It’s clearly fraudulent. It’s snake-oil. It’s crap peddled by liars.


Because anyone who thinks about the distribution of crime in ANY city for more than 30 seconds will realize that in no way, shape or form does it map to a grid. The suggestion that this is a valid modelling technique is absurd and not only can be laughed out of the room, but should be.

I’m sure the people behind this software will dispute that, since of course their profits depend on hoodwinking the gullible morons who staff every police department in the country. Admittedly, these are people of far-below-normal intelligence and thus they make easy marks. But they could easily disprove my statements with one simple step:

Publish the source code.

Failure to publish the source constitutes a full confession that they’re lying.

John Fenderson (profile) says:

Re: In addition to the reasons given above...

will realize that in no way, shape or form does it map to a grid. The suggestion that this is a valid modelling technique is absurd and not only can be laughed out of the room, but should be.

There’s nothing inherently silly about mapping to a grid. In the end, that’s how pretty much all such modelling is done. The only issue would be grid size. The smaller each cell is, the more accurate the model.

For example, your computer monitor maps every you see, including complex pictures, to a grid, but you can still see a clear image.

Anonymous Coward says:

Re: Re: In addition to the reasons given above...

If PredPol uses any sort of actual probability analyses at all, they’re probably basing it on some sort of Poisson distribution across a grid. The ‘boxes’ on their maps seem to confirm this; if they were using other methods, they’d probably wind up with circles (as in a hot-spot point plus some margin of error).

The problem, as I see it, is that if you divide any urban area into arbitrary grid squares, you’ll wind up with incredibly variable conditions within each square. For instance, one square might contain several 30-story housing projects, while another square might contain a shipyard and a lot of water. In theory it might be possible to control for things like population density or levels of economic disparity, but there’s no evidence that PredPol is doing this.

It’s fairly obvious that if PredPol has any statisticians or mathematicians on staff, they’re mostly being used for producing misleading marketing cruft.

aldestrawk says:

Re: Re: Re: In addition to the reasons given above...

I don’t see why you need to use the margin of error to affect the shape of the box. The margin of error can be a separate value while holding the shape to a square. A patrol just has to include the whole box while traveling streets that don’t necessarily align with the box anyway. Including territory just outside of the box doesn’t invalidate the conformance of a patrol to the Koper Curve Principle. The only difference that variability in the nature of structures contained in the box makes is how the patrols take place (i.e. whether by vehicle, on foot, or by boat).

aldestrawk says:

There is an important aspect of predictive policing that is not mentioned in any of the press releases or press articles. I am guessing the reason it isn’t mentioned is because law enforcement is afraid that if it was common knowledge the effectiveness would be diminished. Unfortunately, leaving it out skews our understanding of predictive policing and leads to false ideas like the software can predict a crime at this spot and this particular time. If this missing aspect, the Koper Curve Principle, was explained than it would probably lead to a greater public acceptance and less skepticism.

The Koper Curve Principle is used in association with any type of “hot spot” type patrolling. It basically says that periodic, and highly visible, patrols of an area that are 12-16 minutes in length maximize the use of police resources. Crime will be reduced in that area as a result of criminals noticing what seems to be an increase in police presence. This effect will differ depending on the crime and obviously, will have little effect on crimes of passion. At any rate, predictive policing is not a stakeout within a prediction box.
The following is a recent study about the effectiveness of various police practices.

Why is this software package any better that a veteran cop’s intuition or any other statistical analysis of crime data? The advantage over intuition is that it avoids cultural or emotional biases, it adapts to changes more quickly than a human would, and it can be more easily communicated to officers who are not veterans within a particular police department. Since the data proving effectiveness is not there yet we can’t know if this secret algorithm is any better than some other mix of statistical analysis. This secrecy is the snake-oil part. If the multitude of Phd’s had truly figured out a unique statistical analysis then they would have patented it and secrecy would not be part of the package. All I can see is that they update this daily with yesterday’s statistics, the algorithm this new data and applies Bayesian Inference, and it is fed into GIS software to produce convenient maps. This is something that is not too hard to reproduce. Don’t dismiss the idea that predictive policing might be effective but do be skeptical that PredPol has the only true answer. PredPol’s hard sell approach certainly makes me more skeptical of them.

aldestrawk says:

proven accuracy chart

I am guessing the first chart’s x-axis ranges from 0%-90% predictive accuracy. I think they cut off the upper 10% because they wouldn’t have the data for that part. The curve would likely be asymptotic towards 100% accuracy as the boxes are increased from 100 to the number of boxes covering an entire city. Anyway it would stretch police resources too much to include more boxes and contradict the whole point of hot spotting.
A better description of that graph would be to say that PredPol’s software is 20%-25% more accurate in predicting locality of crime than some crime analyst would be using statistical methods that are not mentioned.

aldestrawk says:

Re: proven accuracy chart

I dug into this and found a useful video of a lecture by George Mohler who is the chief scientist for PredPol.
Fair Warning! this video is rather technical and assumes familiarity with the subject.

He talks about this chart at 30 min. into the video. The graph is a comparison of the LAPD crime analyst hot spot generation compared with that generated by PredPol’s algorithm which uses a semi-parametric self-exciting point process. The x axis represents the number (or possibly a percentage of total crimes on a particular day) of crimes that happen within all hot spots. Their repeated claim that PredPol’s algorithm is twice as accurate as an analyst is really only valid when generating 20 hotspots. At higher numbers of hot spots this ration falls but is still better in all cases. This is data for a six month period in the Foothill division of the LAPD.

The video eliminates the secrecy of all this. Mohler, in fact, points out that you can take the same equations he lists along with crime data and write a program to generate hot spots (prediction boxes) yourself. Oee critical point is that the algorithm is different from traditional hotspot generation in that it is predictive rather than just reflecting past crime activity.

aldestrawk says:

Big Data?

A final point on predictive policing. A number of media articles treat this as an application of “Big Data”. It is not big data. Mohler says that he needs 1200 to 2000 crime data points to generate prediction boxes that have a reasonable amount of accuracy. That is the size of data for about any statistical analysis and is no where near what is considered big data. The reason that murder is not included is because cities don’t usually have enough data points for murder (God, what an awful euphemism this is, as I write it).

Cloudsplitter says:

You could do as good or better police prediction with a Ouija board, 10 bones, and the hair of a goat. What a load, it just goes to show that cops and cop administrators are the biggest rubes around. You could not sell this shit to a class load of kindergartners for their milk and cookie money, yet these police departments are buying it, and then letting them selves be used in a dog abd pony show to attact other victims. I would demand a blind test for proof that this crap is any better then flipping a coin, and when it fails throw those bastards in jail for fraud, where they belong.

aldestrawk says:

Re: Re:

The 6 month test in the Foothill Division of the LAPD was a blind test. The police received maps every day but they didn’t know if the map was generated by PredPol or the LAPD crime analyst. Unfortunately, PredPol has obfuscated the results in their “proven accuracy” chart. It is positive result for their software but not as positive as it leads you to believe. The accuracy part didn’t need the blind setup but it will be useful for the efficacy study. They will have to wait years for such a study to yield accurate results. There is no way around that. Still, that doesn’t mean the software isn’t useful. It is way overpriced though. I am thinking about writing my own implementation and selling it at only $10K a pop, one time sale, no SAAS here.

That One Guy (profile) says:

Re: Re: Re:

Unfortunately, PredPol has obfuscated the results in their “proven accuracy” chart.

See that’s the thing, if it really did work as advertised, then you’d think they would have no problem putting out verifiable data demonstrating it, the fact that they apparently went out of their way to hide that info, or inflate it, suggests that they’re being less than honest when selling, and don’t believe it would be able to hold up under honest scrutiny.

CorruptionBuster says:

Politician Ryan Coonerty (Santa Cruz) has MAJOR conflict of interest with PredPol

Ryan Coonerty, who is currently on the Santa Cruz Board of Supervisors, has had an ongoing, major conflict of interest in supporting and working for PredPol. He is/was functioning as their President for a time; most likely has stock options or other financial incentives to push their dishonest products; and uses inappropriate influence with police unions and other public entities to improperly profit from any advances that PredPol makes. Most or all of the involved local players are profiteering Demo party activists: Coonerty, Caleb Baskin, Nate Atkinson, Zach Friend (another current Supervisor SC County), and others. PLEASE take the time to research this yourself, and make your voice heard.

Add Your Comment

Your email address will not be published. Required fields are marked *

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...