Categories
Research Methodologies
April 14, 2023
While data quality has been the topic of much discussion in the market research industry for the past few years, little effort has been made to objectively define the concept.…
While data quality has been the topic of much discussion in the market research industry for the past few years, little effort has been made to objectively define the concept. Data quality is a hygiene factor that is often overlooked when present, but becomes noticeably problematic when missing. However, by defining data quality solely according to the absence of outliers, we risk losing sight of what truly makes data beautiful. What if we defined data quality based on what it is, rather than what it is not?
Often, the way we define data quality is limited to what it is not, by removing Satisfiers, Speeders, and Straight-liners. How we define these in-survey checks is subjective in nature and whether that practice actually works in improving overall results is questionable.
Picture this: You have just completed a long and arduous research project, and you’re eager to present your findings to your client. However, as you begin to delve into the data, your client starts to notice something troubling: the story doesn’t make sense. You feel your stomach drop as your client raises this concern, asking you to explain what’s going on. You rack your brain for an answer and finally settle on “But…there are no Speeders in our data.” Even as you say it, you realize that this is a poor defense. The absence of Speeders does not make the quality of your data good.
Instead, we ought to focus on defining what qualifies as good data.
Let’s take a philosophical step back and consider what makes data beautiful.
At its core, beautiful data makes sense. When we view data quality through this lens, it becomes less subjective than we might think. Data makes sense when the story of each participant is cohesive.
If you’ve seen bad data, you know that participants who cheat in surveys usually answer randomly, and the results are incoherent. For example, Gen Zs buying retirement properties, plumbers performing DNA sequencing, and retirees enrolling in kindergarten classes.
Cohesion doesn’t mean that the findings can’t be surprising; that’s why we do research! But if you were to look at each survey participant in your dataset row by row, you would find that good participants typically remain true to their persona throughout the survey. That’s cohesion.
Another hallmark of good data quality is when open-ended responses are relevant to the question at hand. Open-end responses that are consistent with the rest of the data in terms of themes or patterns further reinforce the cohesiveness of the data. Some might argue that gauging responses this way is also subjective, but the ultimate test is straightforward: Are you comfortable sharing the open-end responses with your client?
Simply removing Satisfiers, Straight-liners, and Speeders is not enough on its own. When we remove participants based on these rules, we simply shoehorn the metrics we have into telling us what we want to see instead of actually determining what we need to know.
To truly achieve good data quality, we need to develop tools that can help us identify a lack of participant-level cohesion. As an example, the Root Likelihood fit score is a great way of improving data quality by identifying participants who may have randomly responded to a choice task, such as a Conjoint exercise. These types of consistency checks are not only better indicators of good-quality data, but they are also less obvious to participants who may become skilled at avoiding the obvious quality assurance traps.
Comments
Comments are moderated to ensure respect towards the author and to prevent spam or self-promotion. Your comment may be edited, rejected, or approved based on these criteria. By commenting, you accept these terms and take responsibility for your contributions.
Disclaimer
The views, opinions, data, and methodologies expressed above are those of the contributor(s) and do not necessarily reflect or represent the official policies, positions, or beliefs of Greenbook.
More from Karine Pepin
Explore five key truths about sampling, uncovering fraud, low-quality respondents, and transparency issues that have eroded data quality over two deca...
Learn key principles of content design that enable researchers to distill insights into assets, fostering stakeholder influence and sustainable busine...
Discover the challenge of identifying AI-generated open-ended responses and the potential consequences for researchers and the market research industr...
This article discusses how the online sampling ecosystem favors professional respondents and bad actors. It advocates for a transformative shift towar...
Sign Up for
Updates
Get content that matters, written by top insights industry experts, delivered right to your inbox.
67k+ subscribers