The Prompt

February 16, 2024

The Cost of Being Wrong: How Overconfidence in Ineffective AI Detection Tools Impacts the Research Ecosystem

Discover the challenge of identifying AI-generated open-ended responses and the potential consequences for researchers and the market research industry.

The Cost of Being Wrong: How Overconfidence in Ineffective AI Detection Tools Impacts the Research Ecosystem
Karine Pepin

by Karine Pepin

Co-Founder at The Research Heads

Dan Wasserman

by Dan Wasserman

Chief Operating Officer at KJT

Conducting research to understand consumers is crucial for helping companies make informed business decisions. Poor data quality jeopardizes the integrity of research, compromising insights and leads to suboptimal business decisions. As such, significant time and resources are dedicated to post-field quantitative data cleaning including assessing the quality of participants' open-ended responses. 

With the rise of generative AI, researchers face a new challenge: Identifying open-ended responses that are ‘too good’ and may be AI-generated. When doing so, we risk disqualifying valuable participants – those with the best answers – which has a significant impact on both research outcomes and the entire market research ecosystem. 

The Risk of Relying on AI Detection Tools

While researchers have traditionally reviewed open-ends through painstaking manual reading, tools powered by AI are desirable due the speed and cost benefit (vs. human review). These tools attempt to determine whether a respondent uses AI, such as ChatGPT, to craft responses to a survey open-end. If a respondent doesn’t write their own response, their data quality is generally considered suspect, at best.

When ChatGPT was initially released (with the GPT3 model), researchers often felt somewhat comfortable identifying what was generated from AI and what was not. With both the release of GPT4 last March and improved prompt education, it has been more difficult to identify what has been crafted from a large language model and what was authentically written by a human.

[related-article title="Too Good to be True: How AI is Impacting Data Quality" url="https://www.greenbook.org/insights/the-prompt-ai/too-good-to-be-true-how-ai-is-impacting-data-quality"]

Research has shown just how difficult it is to spot the difference. One peer-reviewed study found even experts were only successful in identifying AI versus human writing in 38.9% of cases.1 Another evaluation of AI detection tools demonstrated that AI text detection tools often present false positives and false negatives and do not meet accuracy claims presented by the tools themselves.2 Even OpenAI says AI detectors do not work.3

While the technology will likely improve over time, our trust in their accuracy should be properly calibrated. It’s critical practice to review open-end data; however, relying in large part on automated AI checkers can be insufficient to judge respondent quality as the results can be misleading.

These can either record a false negative who is not flagged despite providing an AI-generated response or a false positive who is kicked out of the survey despite being human and providing a quality response. Sufficiently low proportions of either is necessary to establish trust in the results.

Then and Now in the Era of AI Tools

Historically, our attention has been devoted to excluding participants with poorly written open-ended responses, responses deemed irrelevant, or answers blatantly copied from the internet. While we've consistently exercised caution with open-ends that appear 'too good,' the prevalence of AI tools has further intensified our sense of vigilance. The new challenge lies in the potential of inadvertently disqualifying the best participants rather than the worst ones.

The Impact on the Research Ecosystem

Given the lack of consensus on defining quality, researchers make their own judgment calls. While panels replace “bad” completes at no charge, we’re paying for ‘false positives’ in other ways.

Reducing the pool of good participants

Rejecting good participants not only weakens the current state of the research ecosystem, but also threatens its future. Amid the ongoing data quality crisis, good participants are our most valuable asset. Attracting and retaining high-quality panelists serves necessary to maintain credibility as an industry.

Introducing bias to the insights

The subjective nature of data cleaning introduces errors; each time we make the wrong data cleaning decisions, we introduce bias into the data that crucial business decisions depend on.

Diminishing productivity

In the process, we also waste valuable time and resources on data cleaning, extend fieldwork to replace completes, and miss deadlines.

Balancing AI Detection Tools and Research Integrity

When using ineffective AI detection tools that contribute to rejecting the best participants, we alienate the most thoughtful people, bias the insights, and diminish our productivity. Most importantly, it may result in missing out on valuable insights that could impact critical business decisions.

While ensuring we have a multi-pronged approach to evaluating respondent quality is essential, our most potent tool lies in recruiting and validating participants thoroughly to establish a strong foundation of trust in them.

Researchers should consider continuing to learn about AI, both in terms of how it functions (foundational knowledge) as well as how specific tools can be applied to market research. It is also imperative to support industry initiatives around data quality like the Global Data Quality collaboration. Collectively, the more that we can think critically about how decisions to include or exclude respondents may impact both study results as well as the larger ecosystem, the more positively it will impact everyone in the industry.

References

  1. Casal JE, Kessler M. Can linguists distinguish between chatgpt/ai and human writing?: A Study of Research Ethics and academic publishing. Research Methods in Applied Linguistics. 2023;2(3):100068. doi:10.1016/j.rmal.2023.100068
  2. Weber-Wulff D, Anohina-Naumeca A, Bjelobaba S, et al. Testing of detection tools for AI-generated text. International Journal for Educational Integrity. 2023;19(1). doi:10.1007/s40979-023-00146-z

  3. 1. How can educators respond to students presenting AI-generated content ... OpenAI Help. Accessed January 9, 2024. https://help.openai.com/en/articles/8313351-how-can-educators-respond-to-students-presenting-ai-generated-content-as-their-own.

artificial intelligencesurvey datasurvey quality

Comments

Comments are moderated to ensure respect towards the author and to prevent spam or self-promotion. Your comment may be edited, rejected, or approved based on these criteria. By commenting, you accept these terms and take responsibility for your contributions.

Disclaimer

The views, opinions, data, and methodologies expressed above are those of the contributor(s) and do not necessarily reflect or represent the official policies, positions, or beliefs of Greenbook.

More from Karine Pepin

Truth To Be Told: Five Realities About Online Sample That Compromise Data Quality
Data Quality, Privacy, and Ethics

Truth To Be Told: Five Realities About Online Sample That Compromise Data Quality

Explore five key truths about sampling, uncovering fraud, low-quality respondents, and transparency issues that have eroded data quality over two deca...

From Deliverables to Research Assets: How Insights Teams Can Leverage Content Design Principles for Greater Influence
Research Methodologies

From Deliverables to Research Assets: How Insights Teams Can Leverage Content Design Principles for Greater Influence

Learn key principles of content design that enable researchers to distill insights into assets, fostering stakeholder influence and sustainable busine...

Why the Sampling Ecosystem Sets Up Honest Participants for Failure
Data Science

Why the Sampling Ecosystem Sets Up Honest Participants for Failure

This article discusses how the online sampling ecosystem favors professional respondents and bad actors. It advocates for a transformative shift towar...

Humans and AI: A Collaboration to Assess the Quality of Open-Ended Responses in Survey Research
The Prompt

Humans and AI: A Collaboration to Assess the Quality of Open-Ended Responses in Survey Research

Over the years, significant time and resources have been dedicated to improving data quality in survey research. While the quality of open-ended respo...

Sign Up for
Updates

Get content that matters, written by top insights industry experts, delivered right to your inbox.

67k+ subscribers