The Prompt

August 25, 2023

Too Good to be True: How AI is Impacting Data Quality

Too Good to be True: How AI is Impacting Data Quality

Picture this, you’re reviewing survey data and reading open-ended responses. You’ve just found the epitome of good open ends: a thorough, insightful answer, no spelling errors, no profanity. It’s perfect.…

Picture this, you’re reviewing survey data and reading open-ended responses. You’ve just found the epitome of good open ends: a thorough, insightful answer, no spelling errors, no profanity. It’s perfect.

But is it… too perfect? This time last year, this probably wasn’t a question we would be asking ourselves.

We have tracked a steady decline in sample quality conversions since November 2022 due to an increase in suspected fraud activity and rising project level data quality removals. It’s probably no coincidence that ChatGPT, an artificial intelligence chatbot developed by OpenAI, launched on November 30th, 2022. ChatGPT exploded into the mainstream practically overnight with over 1 million users in just 5 days. Since then, we have been seeing an increase in suspicious open ends from historically valid panelists using AI to cut corners, more nefarious organizations bypassing fraud detection security systems, and a resurgence of the haunting ghost completes.

Related

The Sticky Truth About Data Quality: The Invisible Glue Holding Your Insights Together

The use of artificial intelligence by fraudsters is one of the greatest examples of how fraud evolves over time, always keeping us on our toes. Quality checks that work now or worked in the past are not future proof. The research industry will continue to improve our tools to block fraud but on the other end, fraudsters also become more sophisticated in their ability to break into surveys. It will take a concentrated and collaborative effort across the industry to work against fraud enabled by AI.

While we do see an increase in possible fraud and questionable quality, the heightened awareness of these activities has also caused the everyday data cleaner to be suspicious of everything—not only the responses lacking insights, but also those with too much insight. Where we used to flag on too few words in an open end, we’re now suspicious of too many words, the usage of similar words, responses that have the exact same word count, responses with perfect punctuation, or multiple responses with similar misspellings.

With more thorough data cleaning measures being implemented (as they should be), there is cause for concern about the percentage of honest human responses that are being tossed out of an abundance of caution.

A couple of quick tips to aid in better spotting AI generated responses is to include an open-ended question that asks the opinion or feeling of a respondent. AI currently cannot respond with opinion or feelings on a topic so the responses are a bit easier to identify as they would all be fact driven. We also suggest including copy/paste detection in your survey. Blocking the respondent that attempts to paste an answer is not enough.

There is no good reason for a respondent to copy a response or question, either. Therefore, copying is a flag that can be added as a programmatic quality check as well. When AI is discovered, it is important to replay the details back to the recruitment source so they can take a deeper dive into the panelist’s behaviors.

Some market research organizations have already started acquiring, building, and implementing AI platforms to assist with their research. We anticipate good things as the industry collaborates to better understand AI technology and continues to strive toward quality. Moral of the story: don’t throw the babies out with the bathwater, but make sure the little guy gets a good scrub before we dress him up and send him out into the world.

artificial intelligencedata qualitysurvey quality

Comments

Comments are moderated to ensure respect towards the author and to prevent spam or self-promotion. Your comment may be edited, rejected, or approved based on these criteria. By commenting, you accept these terms and take responsibility for your contributions.

MD

Mary Draper

Vice President, Network Partners & Quality at EMI Research Solutions

1 article

author bio

Disclaimer

The views, opinions, data, and methodologies expressed above are those of the contributor(s) and do not necessarily reflect or represent the official policies, positions, or beliefs of Greenbook.

The AI Divide: Navigating the New Research Landscape
Focus on APAC

The AI Divide: Navigating the New Research Landscape

A 2026 industry survey explores how AI-enabled agencies are reshaping market research, from workflows and skills to client expectations and value crea...

Piers Lee

Piers Lee

Founder and Managing Director at Novema

The AI Shift No One Saw Coming: From Software to Services
The Exchange

The AI Shift No One Saw Coming: From Software to Services

The research and CX industry is shifting fast as AI reshapes analytics, enterprise platforms, and ho...

Agentic & Conversational AI for Research: What the Latest Showcase Revealed About the Future of Insights
Artificial Intelligence and Machine Learning

Agentic & Conversational AI for Research: What the Latest Showcase Revealed About the Future of Insights

Explore how agentic and conversational AI are reshaping market research through AI moderation, conve...

Can AI Deepen Human Insight? Rhiannon Price on the Evolution of Qualitative Research
CEO Series

Can AI Deepen Human Insight? Rhiannon Price on the Evolution of Qualitative Research

Rhiannon Price explores how AI is transforming qualitative research, empathy, and human insight in t...

Sign Up for
Updates

Get content that matters, written by top insights industry experts, delivered right to your inbox.

67k+ subscribers