Insights Home All Topics Expert Channels Webinars Podcast

Catch Me If You Can: Why Objectively Defining Survey Data Quality is Our Biggest Challenge

In the insights industry, experts have described 2022 as the Year of Data Quality. There is no doubt that it has been a hot topic of discussion and debates throughout…

by Karine Pepin

Co-Founder at The Research Heads

In the insights industry, experts have described 2022 as the Year of Data Quality. There is no doubt that it has been a hot topic of discussion and debates throughout the year. However, we find common ground where most agree there is no silver bullet to address data quality issues in surveys.

As the Swiss cheese model suggests, to have the best chance of preventing survey fraud and poor data quality we need to approach the problem by thinking of it in terms of layers of protection that are implemented throughout the research process.

To this end, the Insights Association Data Integrity Initiative Council has published a hands-on toolkit. It includes a Checks of Integrity Framework with concrete data integrity measures. This is essential to all phases of survey research: pre-survey, in-survey, and post-survey.

The biggest challenge yet remains: objectively defining data quality

What constitutes good data quality remains nebulous. We can agree on what is very bad data such as gibberish open-ended responses. However, identifying poor-quality data is rarely so simple. The responses we keep or remove from a dataset are often a tough call. These called are often based on our own personal assumptions and tolerance for imperfection.

Because objectively defining data quality is difficult, researchers have developed a wide range of in-survey checks. Including; instructional manipulation, low incidence, speeder, straight lining, red herring questions, and open-end responses, that act as predictors of poor-quality participants. But, like data quality itself, these predictors are subjective in nature.

The lack of objectivity leads to miscategorizing participants

The in-survey checks typically built into surveys inadvertently lead to miscategorizing participants as false positives (i.e. incorrectly flagging valid respondents as problematic) and false negatives (i.e. incorrectly flagging problematic respondents as valid).

In fact, these in-survey checks may penalize human error too harshly. While, at the same time, making it too easy for experienced participants, whether fraudsters or professional survey takers, to fall through the cracks. As an example, most surveys exclude speeders, participants who complete the survey too quickly to have provided thoughtful responses.

While researchers are likely to agree on what is unreasonably fast (or bot-fast!), there is no consensus on what is a little too fast. Is it the fastest 10% of the sample? Or those completing in <33% relative to median duration?

This subjectivity baked into these rules can result in researchers flagging honest participants who read and process information faster, or those who are less engaged with the category. Researchers might not flag participants with excessively long response time, the crawlers who could be translating the survey, or fraudulently filling out more than one survey at once.

Improving our hit rate

These errors have a serious impact on the research. On the one hand, false positives can have negative consequences such as providing a poor survey experience and alienating honest participants.

Is this not a compelling enough reason to avoid false positives? Then think about the extra days of fieldwork needed to replace participants. On the other hand, false negatives can cause researchers to draw conclusions based on dubious data which lead to bad business decisions.

Our ultimate goal as responsible researchers is to minimize these errors. To achieve this, it’s critical that we shift our focus to understanding which data integrity measures are most effective at flagging the right participants. With this in mind, using advanced analytics (e.g.Root Likelihood in conjoint or maxdiff) to identify randomly answering, poor-quality participants presents a huge opportunity.

Onwards and upwards

In 2022, much worthwhile effort was devoted to raising awareness and educating insights professionals. Especially, on how to identify and mitigate data issues in survey response quality. Moving forward, researchers need a better understanding of which data integrity measures are most effective at objectively identifying problematic respondents in order to minimize false positives and false negatives.

data quality survey data survey quality

Comments

Comments are moderated to ensure respect towards the author and to prevent spam or self-promotion. Your comment may be edited, rejected, or approved based on these criteria. By commenting, you accept these terms and take responsibility for your contributions.

Karine Pepin

Co-Founder at The Research Heads

14 articles

Disclaimer

The views, opinions, data, and methodologies expressed above are those of the contributor(s) and do not necessarily reflect or represent the official policies, positions, or beliefs of Greenbook.

Truth To Be Told: Five Realities About Online Sample That Compromise Data Quality

Explore five key truths about sampling, uncovering fraud, low-quality respondents, and transparency issues that have eroded data quality over two deca...

February 25, 2025

Read article

Research Methodologies

From Deliverables to Research Assets: How Insights Teams Can Leverage Content Design Principles for Greater Influence

Learn key principles of content design that enable researchers to distill insights into assets, fostering stakeholder influence and sustainable busine...

August 23, 2024

Read article

The Prompt

The Cost of Being Wrong: How Overconfidence in Ineffective AI Detection Tools Impacts the Research Ecosystem

Discover the challenge of identifying AI-generated open-ended responses and the potential consequences for researchers and the market research industr...

February 16, 2024

Read article

Data Science

Why the Sampling Ecosystem Sets Up Honest Participants for Failure

This article discusses how the online sampling ecosystem favors professional respondents and bad actors. It advocates for a transformative shift towar...

October 9, 2023

Read article

See all articles

Get content that matters, written by top insights industry experts, delivered right to your inbox.

67k+ subscribers

Get the latest updates from top market research, insights, and analytics experts delivered weekly to your inbox

Your guide for all things market research, Insights, and analytics

Catch Me If You Can: Why Objectively Defining Survey Data Quality is Our Biggest Challenge

The biggest challenge yet remains: objectively defining data quality

The lack of objectivity leads to miscategorizing participants

Related

How to Improve Your Survey Data Quality

Improving our hit rate

Onwards and upwards

Truth To Be Told: Five Realities About Online Sample That Compromise Data Quality

From Deliverables to Research Assets: How Insights Teams Can Leverage Content Design Principles for Greater Influence

The Cost of Being Wrong: How Overconfidence in Ineffective AI Detection Tools Impacts the Research Ecosystem

Why the Sampling Ecosystem Sets Up Honest Participants for Failure