from the can-you-fix-the-people-performing-the-tests? dept
Everything everyone saw in cop shows as evidence linking people to crimes — the hair left on someone’s clothing, the tire tracks leading out to the road, the shell casings at the scene, etc. — is all proving to be about as factual as the shows themselves.
While much of it is not exactly junk science, much of it has limited worth. What appears to indicate guilt contains enough of a margin of error that it could very easily prove otherwise. Science Magazine is taking a look at the standbys of forensic science and what’s being done to ensure better presentations of evidence in the future.
On a September afternoon in 2000, a man named Richard Green was shot and wounded in his neighborhood south of Boston. About a year later, police found a loaded pistol in the yard of a nearby house. A detective with the Boston Police Department fired the gun multiple times in a lab and compared the minute grooves and scratches that the firing pin and the interior of the gun left on its cartridge casings with those discovered on casings found at the crime scene. They matched, he would later say at a pretrial hearing, “to the exclusion of every other firearm in the world.”
So how could the detective be sure that the shots hadn’t been fired from another gun?
The short answer, if you ask any statistician, is that he couldn’t. There was some unknown chance that a different gun struck a similar pattern. But for decades, forensic examiners have sometimes claimed in court that close but not identical ballistic markings could conclusively link evidence to a suspect—and judges and juries have trusted their expertise. Examiners have made similar statements for other forms of so-called pattern evidence, such as fingerprints, shoeprints, tire tracks, and bite marks.
Six years ago, the National Academy of Sciences found that these forensic standbys had a much larger margin of error than was portrayed in court by detectives and expert witnesses. It recommended the margin of error be delivered along with the testimony to head off future verdicts based on faulty evidence.
To date, not much has changed. While actual junk science like bite marks has largely been discarded by prosecutors, the others remain, even as their reliability has been constantly questioned. The FBI loved hair analysis, right up to the point that it determined its witnesses had overstated test results 90% of the time in the two decades prior to 2000.
Even fingerprints, which have long been considered unassailable because of their supposed uniqueness, aren’t much better. Some of it has to do with the presumption that every fingerprint is so unique even a partial print can eliminate suspects. The rest of its issues lie with those matching the prints.
One study of 169 fingerprint examiners found 7.5% false negatives—in which examiners concluded that two prints from the same person came from different people—and 0.1% false positives, where two prints were incorrectly said to be from the same source. When some of the examiners were retested on some of the same prints after 7 months, they repeated only about 90% of their exclusions and 89% of their individualizations.
The NIST has given $20 million to the Center for Statistics and Applications in Forensic Evidence (CSAFE) to come up with a better way to present this sort of evidence — one that clearly accounts for any uncertainties in the results or processes. CSAFE is still trying to figure out how to present this as a number/rating. But that might not be the only problem. The other issue is that juries and judges may not find specifics about forensic reliability to play much of a part in deciding guilt or innocence.
In a 2013 study, for instance, online participants had to rate the likelihood of a defendant’s guilt in a hypothetical robbery based on different kinds of testimony from a fingerprint examiner. It didn’t seem to matter whether they were simply told that a print at the scene “matched” or was “individualized” to the defendant, or whether the examiner offered further justification—the chance of an error is “so remote that it is considered to be a practical impossibility,” for example. In all those cases, jurors rated the likelihood of guilt at about 4.5 on a 7-point scale. “As a lawyer, I would have thought the specific wording would have mattered more than it did,” Garrett says. But if subjects were told that the print could have come from someone else, they seemed to discount the fingerprint evidence altogether.
The other part of the problem is the people who perform the tests. Multiple incidents where evidence was falsified or not properly tested have been uncovered. The evidence is only as good as the processes, and if steps are skipped because of sloppiness or laziness, the evidence’s credibility becomes highly questionable — not just for the specific instance where results were faked, but for every test this person has touched.
There’s no possible way to eliminate honest errors, much less prevent anyone from falsifying results. In both cases, the problems are caught after the damage has been done. Humans are the most unpredictable part of the chain of evidence but also an irreplaceable part. CSAFE will be working with forensics labs to create best practices, but it can do nothing to prevent the lazy and/or incompetent from completely ignoring the proper steps.
Problems are also present higher up the chain. When bad science or bad practices result in questionable evidence, it’s often extremely difficult to have convictions resulting from them overturned.
What’s troubling, [federal judge Nancy] Gertner says, is that when judges accept junk science, an appeals court rarely overrules them. Attaching a numerical probability to evidence, as CSAFE hopes to do, “would certainly be interesting,” she says. But even a standard practice of critically evaluating evidence would be a step forward. “The pattern now is that the judges who care about these issues are enforcing them, and the judges who don’t care about these issues are not.”
In this way, the courts are no better than labs where shoddy work is done. Variations in personality undermine the dispassionate nature of science, making it susceptible to human prejudices rather than the strength of the evidence itself.