by Spencer Kimball, Emerson College; Camille Mumford, Emerson College, and Matt Taglia, Emerson College, [This article first appeared in The Conversation, republished with permission]
As the U.S. presidential election approaches, news reports and social media feeds are increasingly filled with data from public opinion polls. How do pollsters know which candidate is ahead in what swing state or with which key demographic group? Or what issues are most important to as many as 264 million eligible voters across a vast country?
In other words: How do pollsters do what they do?
At Emerson College Polling, we lead a dynamic survey operation that, like many others, has continuously evolved to keep pace with shifting trends and technologies in survey research. At the inception of survey research – about 100 years ago – data was primarily collected through mail and in-person interviews. That’s not true nowadays, of course.
In the early days of the survey industry, being asked to participate in a poll was novel, and response rates were high. Today, we’re bombarded with survey requests via email, text, online pop-ups, and phone calls from unknown numbers. With fewer landlines, busy parents juggling work and family, and younger adults who rarely answer calls, preferring text communication, it has become much harder to engage respondents. This shift in behavior reflects the evolving challenges of reaching diverse populations in modern survey research.
Evolution of data collection
In the broadest possible terms, polls and surveys have two elements – choosing whom to contact, and reaching them in a way that’s likely to get a response. These elements are often intertwined.
In the 1970s, after household telephones had become widespread in the U.S., survey operators adopted a random-sampling method called random digit dialing, in which the survey’s designers would choose the area codes they wanted to reach and live operators randomly dialed seven-digit phone numbers within that area code.
By the 1990s, pollsters began moving away from random digit dialing, which was time-consuming and expensive because the random selection often picked phone numbers that were out of service or not useful for opinion surveys, such as businesses or government offices. Instead, pollsters began adopting registration-based sampling, in which public voter registration records were used to compile the lists from which respondents were randomly selected.
The information in these and other associated public records, such as those detailing gender, age and educational attainment, allowed a refinement of random sampling called stratified sampling. That’s where the one big list was split into subgroups based on these different characteristics, such as party affiliation, voting frequency, gender, race or ethnicity, income or educational attainment.
Survey-takers then chose randomly from among those subgroups in proportion to the population as a whole. So if 40% of the overall population have college degrees and 60% do not, a poll of 100 people would randomly select 40 people from the list of those with a college degree and 60 from the list of those without.
Other advances in ways to reach respondents emerged late in the 20th century, such as interactive voice response, which did not require live operators. Instead, automated systems played recordings of the questions and registered the spoken responses. In 2000, internet-based polling also began to emerge, in which participants filled out online forms.
From probability to nonprobability sampling
Over the past two decades, the rise of cellphones, text messaging and online platforms has dramatically changed survey research. The traditional gold standard of using only live operator telephone polls has become nearly obsolete. Now that phones display who is calling, fewer people answer calls from unknown numbers, and fewer of them are willing to talk to a stranger about their personal views.
Even the random sampling that was once standard has given way to a nonprobability sampling approach based on increasingly specific population proportions. So if 6% of a population are Black men with a certain level of education and a certain amount of household income, then a survey will strive to have 6% of its respondents match those characteristics.
In quota sampling, participants may not be selected randomly but rather chosen as participants because they have specific demographic attributes. This method is less statistically rigorous and more prone to bias, though it may yield a representative sample with relative efficiency. By contrast, stratified sampling randomly selects participants within defined groups, reducing sampling error and providing more precise estimates of population characteristics.
To help polling operations find potential respondents, political and marketing consulting firms have compiled voter information, including demographic data and contact details. At Emerson College Polling, we have access to a database of 273 million U.S. adults, with 123 million mobile numbers, 116 million email addresses and nearly 59 million landline numbers.
A newer technique pollsters are using to reach respondents is something called river sampling, an online method in which individuals encounter a survey during their regular internet browsing and social media activity, often through an ad or pop-up. They complete a short screening questionnaire and are then invited to join a survey opt-in panel whose members will be asked to take future surveys.
Emerson College Polling methodology
Our polling operation has used a range of approaches to reach the more than 162,000 people who have completed our polls so far this year in the United States.
Unlike traditional pollsters, Emerson College Polling does not rely on live operator data collection outside of small-scale tests of new survey methods to evaluate and improve the effectiveness of different polling approaches.
Instead, like most modern pollsters, we use a mix of approaches, including text-to-web surveys, interactive voice response on landlines, email outreach, and opt-in panels. This combination allows us to reach a broader, more representative audience, which is essential for accurate polling in today’s fragmented social and media landscape. This diverse population includes younger individuals who communicate through various platforms distinct from those used by older generations.
When we contact the people in our stratified samples, we take into account differences between each communication method. For example, older people tend to answer landlines, while men and middle-aged people are more responsive to mobile text-to-web surveys. To reach underrepresented groups – such as adults ages 18 to 29 and Hispanic respondents – we use online databases that they have voluntarily signed up for, knowing they may be surveyed.
We also use information about whom we sample and how to calculate the margin of error, which measures the precision of poll results. Larger sample sizes tend to be more representative of the overall population and therefore lead to a smaller margin of error.
For instance, a poll of 400 respondents typically has a 4.9% margin of error, while increasing the sample size to 1,000 reduces it to 3%, offering more accurate insights.
The goal, as ever, is to present to the public an accurate reflection of what the people as a whole think about candidates and issues.
Spencer Kimball, Associate Professor of Communications, Director of Emerson College Polling, Emerson College; Camille Mumford, Affiliated Professor in Communication Studies, Emerson College, and Matt Taglia, Senior Director of Emerson College Polling, Emerson College
This article is republished from The Conversation under a Creative Commons license. Read the original article.