from the poll-position dept
Let's jump right into this, because this post is going to be a bit on the wonky side. It's presidential silly season, as we have said before, and this iteration of it is particularly bad, like a dumpster fire that suddenly has a thousand gallons of gasoline dropped onto it from a crop-duster flown by a blind zombie. Which, of course, makes it quite fascinating to watch for those of us with an independent persuasion. Chiefly interesting for myself is watching how the polls shift and change with each landmark on this sad, sad journey. It makes poll aggregating groups, such as the excellent Project FiveThirtyEight, quite useful in getting a ten-thousand foot view as to how the public is reacting to the news of the day.
But sites like that obviously rely on individual polls in order to generate their aggregate outlooks, which makes understanding, at least at a high level, just how these political polls get their results interesting as well. And, if you watch these things like I do, you have probably been curious about one particular poll, the U.S.C. Dornsife/Los Angeles Times Daybreak poll, commonly shortened to the USC/LAT poll, which has consistently put out results on the Presidential race that differ significantly from other major polls. That difference has generally amounted to wider support for Donald Trump in the race, with specific differences in support for Trump among certain demographics. To the credit of those that run the poll, they have been exceptionally transparent about how they generate their numbers, which led the New York Times to dig in and try to figure out the reason for the skewed results. It seems an answer was found and it's gloriously absurd.
There is a 19-year-old black man in Illinois who has no idea of the role he is playing in this election. He is sure he is going to vote for Donald Trump. Despite falling behind by double digits in some national surveys, Mr. Trump has generally led in the USC/LAT poll. He held the lead for a full month until Wednesday, when Hillary Clinton took the nominal lead. Our Trump-supporting friend in Illinois is a surprisingly big part of the reason. In some polls, he's weighted as much as 30 times more than the average respondent, and as much as 300 times more than the least-weighted respondent.
Alone, he has been enough to put Mr. Trump in double digits of support among black voters. He can improve Mr. Trump's margin by 1 point in the national survey, even though he is one of around 3,000 panelists.
So, how does one person manage to skew a major national political poll in favor of one candidate to the tune of entire percentage points? Well, it turns out that a confluence of factors that include who is included on the poll and how often, how the poll respondents are weighted, and how this one particular voter fits into the demographic weighting converged to pretty much mess everything up. Let's start with the weighting.
The USC/LAT poll does things a bit differently than the other national polls. All polls rate respondents by demographics to correct for voting tendencies. The math can get gory and the NYT post does a good job of going through it, but you can think of it like this, for a very imprecise example: a poll respondent from the 18-35 demographic will be weighted less than a respondent from the 36-55 demographic, because the latter demo is more likely to actually show up and vote than the former. There is indeed some subjectivity in this, but the large demographic weighting drives the error margin down for the most part. But the USC/LAT poll deviates from the large-demo weighting and instead weights at very small demographic levels.
The USC/LAT poll weights for many tiny categories: like 18-21 year old men, which the USC/LAT poll estimates make up around 3.3 percent of the adult citizen population. Weighting simply for 18-21 year olds would be pretty bold for a political survey; 18-21 year old men is really unusual...When you start considering the competing demands across multiple categories, it can quickly become necessary to give an astonishing amount of extra weight to particularly underrepresented voters -- like 18-21 year old black men.
Which is how our single friend in Illinois became the poll's most weighted voter, being a 19 year old black man. The heavy weighting on tiny demographic categories caught him several times and, since he is voting for Trump, despite his demographic generally not voting for Trump, his heavily-weighted response skews things wildly. But that isn't all.
The USC/LAT poll does something else that's really unusual: it weights the sample according to how people said they voted in the 2012 election. The big problem is that people don't report their past vote very accurately. They tend to over-report three things: voting, voting for the winner and voting for some other candidate. They underreport voting for the loser. By emphasizing past vote, they might significantly underweight those who claim to have voted for Mr. Obama and give much more weight to people who say they didn't vote.
Which, again, catches our friend from Illinois. At nineteen, he obviously didn't vote in the last election. So his response is weighted even more. Using the poll's own data, the New York Times re-ran the poll using the same broad categories most other major polls used. When done, Hillary Clinton led in every single one of the iterations except for the one immediately proceeding the GOP convention. The difference between the poll's results as reported and what they would be with the normal weighted categories and the omission of the past vote weighting ranged form 1-4 points. In a political poll, that's enormous.
The final factor here is that the USC/LAT poll is a panel poll, which means that the same respondents are used each time the poll is run. So, our young black trump-voting man from Illinois got to skew these results nearly each and every time. The one time he failed to respond to the poll, Hillary Clinton suddenly led within it. As the NYT notes:
The USC/LAT poll had terrible luck: the single most overweighted person in the survey was unrepresentative of his demographic group. The people running the poll basically got stuck at the extreme of the added variance.
And, of course, the poll aggregators might include this poll, skewing the aggregate numbers as well. This isn't to say that all polls are skewed in the same manner. They aren't. The reason this is a story is because this poll is the outlier. But it is kind of fun to see how badly the sausage can be made if the methodology isn't in tune.