Polls Fail Due to Poor Oversampling – Democratic Complacency Misplaced

Today Joe Biden becomes the 46th president of the United States of America. He was widely expected to achieve a landslide victory in the election last year, with the polling placing him around 8 points ahead. It was assumed that the pollsters must have learnt from the errors of 2016, and therefore would be accurate. Maybe Biden won’t carry Texas, commentators conjectured, but he’ll sweep the swing states. Then the results came in.

Biden may have still won convincingly in the electoral college with 306 votes (comfortably over the 270 needed for a win), but the states which eventually pulled him over the line – Arizona, Georgia and Wisconsin – were all taken by mere fractions of a percent. A polling lead average of 8.4 points shrivelled to less than 4.5 points in the election, and with that, bold claims of a blue landslide evaporated. As Biden becomes president, the Democratic majority in the House is smaller than in 2020, and the Senate is taken on a literally-50:50 knife-edge – the weakest of all-chamber victories. But it’s still not entirely clear why predictions ended up so off the mark for a second consecutive presidential election.

A common theory bandied around is the notion of the ‘shy Trump voter’, akin to Nixon’s ‘silent majority’. Pollster Robert Cahaly has been one of its most vocal proponents, and his predictions were more accurate than other pollsters in some states. However, his results aren’t necessarily due to his shy Trump voter theory, and his results consistently overestimated Trump’s support relative to Biden’s. If a sizeable chunk of shy Trump voters is the major factor, one would expect an equally big swing in the polls in 2016, but that’s a myth: Clinton was predicted a 4 point lead and ended up with a 2 point lead in the popular vote.[i] That error was largely due to inaccurate polling in crucial Midwestern states. Would Trump voters be notably ‘shy’ in those states alone? Or have become markedly more ‘shy’ over the intervening 4 years? More likely, a bigger factor behind Trump’s higher-than-expected 2016 Midwestern performance was pollsters’ failure to anticipate a rise in non-college educated voter turnout.

Where Cahaly’s polling gets more interesting is in his insistence in keeping polling short. One of the more intriguing theories for the error in polling, and why it became especially pronounced in 2020, is that the likelihood of doing an extended survey has become correlated with party support. As Trump has continually undermined faith in media institutions over the last 4 years amongst his supporters, and outrage in liberal circles has grown over his presidency, data analyst David Shor estimates Republican supporters became less likely – and enthusiastic, media-trusting Democrat supporters more likely – to spend the time on long polling surveys. Just as education became a big predictor of leaning, it’s possible that the act of completing a long poll has become a slight predictor of one’s voting inclination. The fixation on estimating this ethnic vote or that, as though large ethnic groups are uniform blocks with identical perspectives, results in pollsters falling behind the curve when it comes to new predictors of voting intention, as with education in 2016 and perhaps poll answering today.

That might be a notable factor, but it’s also worth looking at the basics of polling methodologies. I assumed that this must be nailed over decades of opinion polling, but then a friend found that adjusting some polls to weight for party ID in line with the general population resulted in much lower leads for Biden than the polls forecasted[ii]. I had a look at the final polls, and it transpires that adjusting for party ID, whilst sometimes only changing the lead slightly, resulted in much more accurate results in numerous polls.[iii] My sample is not exhaustive, but it makes sense that the best predictor for how one votes would be party ID,[iv] and therefore pollsters should ensure that they don’t oversample one group of party supporters, as several of these polls did. Provided there isn’t anything I’ve overlooked, it begs belief that pundits didn’t identify this and cool expectations of a comfortable Democrat victory.

However, many pundits have often misplaced the emphasis from final polling results, often judging polls on whether they called the right candidate, as opposed to the point-margin by which polls erred. They might run to suggest that polling was terrible in 2016, and slightly better in 2020, even as the error more than doubled in 2020. Many overlook outcomes that even the imperfect polls indicate are notable likelihoods [v] and are liable to miss trends as they observably develop, for example largely failing to notice the tightening of the results in Arizona.[vi] So, perhaps it is conceivable that issues with polling weightings could become an oversight too.

If this is all valid, then it paints a very bleak picture for the prospects of real understanding between the US population at large and the so-called political class. Already facing fundamentally distinct perspectives and a population dispersed across the width of a continent, reliable and well-interpreted data is a key means of understanding the differences and similarities in opinions and conditions that exist in a large country. Without that, misunderstanding will likely grow. A major goal of politics ought to be to improve people’s lives, and although it’s true that without winning a degree of electoral success one’s ability to change things is constrained, as long as there are marked rifts of understanding between a party and a large segment of the population, it seems implausible that that party will expand its base significantly. In the Democrats’ case, that will mean no supermajorities for constitutional amendments, and it will mean weak majorities unable to pass numerous much-hyped reforms.

But that’s a pretty arrogant case for a party to increase its understanding of its country. It’s a party assuming that it already has all of the answers, if only people would vote for them. But if polling and the interpretation of polling can be modified to obtain a better grasp of the wants and needs of as many people as possible, then policy makers will have a much stronger hand with which to improve voters’ lives. If politicians and their supporters are in the business of bettering society, that should be a compelling argument for a re-evaluation of the current approach to politics.

To realise that goal, scrutiny needs to come to the fore in place of complacency. The events of recent days have further exposed the fractious nature of US politics today, but it’s worth remembering that this partisan polarisation did not come out of a vacuum. Whilst Republicans tainted Obama’s administration with legislative standstill for party gain, some Democrats took notes. Pandora’s Box of undermining election results was opened in 2016 by calls for electors to defect from Trump and Pelosi’s insistence that the 2016 election was hijacked before any investigation into Russian links had been carried out. Trump has taken this tactic to new, dangerous extremes, but it didn’t appear overnight.

Now the Democratic Party is, by the skin of its teeth, assuming control of both legislative chambers and the executive, it has an opportunity to build policies and approaches that will deliver for many, and thus help to reduce polarisation and instability. This will require looking out beyond Capitol Hill with clear glasses, including a nuanced and piercing look at polling – smugness over a narrow victory and the blind pursuit of partisan vendettas can only end in tears.

[i] Moreover, given the revelations by James Comey influenced the election at such late notice, it seems reasonable the polls may have had trouble picking up the full extent of the last-minute swing as the news continued to percolate through the electorate.

[ii] He looked at the polls below, amongst others, and found that adjusting the weighting in line with recent Gallup polls on the party ID make-up of the population gave notably smaller leads for Biden.



[iii] I looked at the polls below, finding notable Democrat oversampling in 6. These inflated the final polling average.


I had to miss a few near-election polls which didn’t release the necessary data to assess their sampling. 

[iv] Polls generally had around 9/10 Democrats voting for Biden and 9/10 Republicans voting for Trump.

[v] Nate Silver points out that it’s normal for polls to get the wrong point margin by 4 points or so, whilst conceding that this isn’t an ideal situation. He points out that polls give a pretty wide range of possible results, finding that polling should neither be taken as gospel nor thrown out as entirely useless. Arguing for a more nuanced and realistic interpretation of polling data is undoubtedly important, but this doesn’t mean that scrutiny of polling methodologies couldn’t produce better polls, especially in the case of simple mis-sampling errors (as I imagine Silver would agree).

[vi] It was predictable that Arizona was going to come down to a fraction of a percent by considering where votes were left to be counted and their number. When the race was called by AP and Fox, a swing of just 1.3 points on remaining ballots (as they turned out to be) would’ve left a Trump win in the state – Arizona was prematurely called and its narrowness was widely unseen even as the results developed.