Report: Managing the Challenges of AI Detection in Academia

Authors: Pierre Lebrun, YuYing Mak, James Shires, and Max Smeets

ECCRI CIC is excited to announce the release of our new report on the management of the challenges of AI detection in academia. This report, researched in summer 2024, examines the increased use of AI chatbots in academia following the launch of ChatGPT in 2022 and the response by educational institutions through AI detection tools.

It discusses the varying effectiveness of these tools in identifying AI-generated content, with issues like false positives and false negatives. The report also highlights ethical concerns around privacy, accuracy, and the use of AI in education. Some institutions are reconsidering the use of detection tools, opting instead to integrate AI into curricula responsibly.

The rise in AI chatbots and detection tools

The launch of ChatGPT in 2022 generated worldwide interest in artificial intelligence (AI) and led to a widespread use of AI chatbots, including by students. Following the emergence of AI chatbots, concerns were raised by higher education institutions about “unfair use of artificial intelligence generated content in an academic environment”1 and the “originality and appropriateness of the content generated by the chatbot”.2

To detect and manage the inappropriate or unfair use of such chatbots, AI detection tools have increased in popularity, with standard plagiarism tools, such as TurnItIn, pivoting to detect AI generated content to varying degrees of efficacy  and at various price points.3  Most AI detection tools in academia are integrated into broader education platforms, such as Moodle, Canvas, Blackboard, Brightspace, Schoology, or Sakai.4

AI detection tools identify generated text by using pattern matching rather than comparing it to a database, as traditional plagiarism checkers do. Language models are trained on vast amounts of text data to learn probabilistic language rules, which they use to create new content. However, generated text often exhibits predictable patterns, such as consistent sentence  structures, overuse of certain conjunctions and vocabulary, and predictable sentence or paragraph lengths. Detection tools aim to spot these patterns and may also incorporate traditional plagiarism checks to identify text that might have been reproduced directly from the model’s training data.5

Responses to AI detection tools

When AI detection tools were first released, higher education institutions hastened to integrate them into education platforms. However, most, if not all, AI detection tools are able to be circumvented given enough time and effort.6 Some higher education institutions are therefore reversing their decision to utilize AI detectors. In 2023, Vanderbilt, Michigan State, Northwestern, and the University of Texas at Austin disabled their Turnitin AI detectors, citing problems with effectiveness discussed above.7 Other education institutions are likely to follow suit, as it may be that detection tools are causing more problems than they solve.8 Some academic institutions are not only disabling AI detection tools, but finding ways to incorporate LLMs ethically and productively into their curriculums.9

Moreover, new “humanizer” tools have been released to enable LLM users to bypass AI detection tools through “rephrasing sentences, altering structures, and incorporating varied vocabulary” which significantly reduces the likelihood of AI detection.10 Initial research suggests that paraphrasing tools significantly complicate AI detection.11 For example, the Washington Post found that Turnitin struggles with identifying AI-generated content when the text mixes human and AI-generated content through paraphrasing tools.12

Although Turnitin released a new AI paraphrasing detection feature to its AI detection tool,13 such responses create a difficult market context for AI detection, with other companies pivoting to other business models,14 or closing.15

What AI detection tools are on the market?

A selection of major AI detection tools are listed below in alphabetical order.  We have also included publicly accessible information regarding the efficacy of detection, education platform integration, pricing (in USD), and release and/or update date.  Note that most of the AI detection tools listed below are mainly effective against ChatGPT-3.5 only.

AI Detection ToolIs there integration into education platforms?Pricing (USD)Date released/updated
CompilatioYes: Moodle, Brightspace, Canvas, Microsoft Teams, Blackboard, Open LMSNo information found16February 2023
Content at ScaleYes: limited information$49/month17No information
Content Detector AINo informationNo information found202318
CopyleaksYes: Moodle, Canvas, Blackboard, Brightspace, Schoology, Sakai$7.99-$13.99/month19January 2023
CrossplagNo information$7-$100/month20January 2023
Detect GPTNo information$7-$29/month21No information
DuplicheckerNo information$110-$2000/year222024
Go WinstonNo information$12-$32/month23February 2023
GPT-ZeroYes: Canvas, Coursify.me, K16 solutions, NewsGuard$10-$23/month24January 2023
OriginalityYes: Moodle, Scribbr$14.95-$30/month25November 2022
Plagiarism Detector (AI detection)No information$110-$330/year26No information
QuillbotYes: No details publicly available as to which platforms$0-$8.33/month27No information
SaplingUnclear$0-$12/month28January 2023
ScispaceLikely, however lack of information$0-$8/month29No information
TurnitinYes: Brightspace, Scribbr$3/student/year30April 2023
Undetectable AINo information$5-$14.99/month31May 2023
WordtuneLikely, however lack of information$0-$9.99/month32January 2023
Writer’s AI detectorNo information$0-$18/month33No information
ZeroGPTYes: No details publicly available as to which platforms$0-$18.99/month34January 2023

Effectiveness of AI detection tools

False positives

In the context of AI detection tools, false positives occur when an AI detection tool incorrectly identifies submitted content as generated by AI.  Some studies indicate that AI detection tools have a high false positive rate, and only a few AI detection tools have significantly low rates of false positive detection.35  In an academic setting, this may mean incorrectly flagging student work as generated by AI, when it is in fact, human-generated.  There are also differences found depending on which AI-model is used to generate the submitted text for the AI detection tool to test, and vice-versa with varying results across studies.36 In addition, content by non-native English speakers is more likely to be incorrectly classified as AI-generated, which is obviously an issue for educational institutions with students from various backgrounds.37

False negatives

In the context of AI detection tools, false negatives occur when an AI detection tool fails to identify submitted content as generated by AI.  Some tools have showed low sensitivity, correctly identifying barely 15% of submitted samples as AI-generated,38 whilst others demonstrate a near perfect score in classifying human-written content, misclassifying only 3% of AI-generated samples.39  In general, results vary widely in accuracy depending on what AI detection tool is used. One study suggests that only two of the main AI detection tools correctly classified all 126 samples as either AI- or human-generated.40 Other researchers claim that AI detection tools produce more false negatives when analyzing more sophisticated language.41

Other considerations

In general, the effectiveness of AI detection tools varies depending on what tool is used, and against what model.  One study found that AI detection tools are more effective with ChatGPT-3.5 content, and less so with ChatGPT-4, except for Copyleaks, Turnitin, and Originality.ai which had greater than 83% accuracy in detecting ChatGPT-4 content.42 This study concluded that “a detector’s free or paid status is not a good indicator of its accuracy”,43 although contrasting findings (with a small sample size) tentatively suggest that paid AI detection tools seemed to be better than free AI detection tools.44 Studies also generally focus on effectiveness of AI detection tools against ChatGPT, ignoring other LLMs.  This may be due to the greater popularity of OpenAI’s models compared to others such as Gemini, Mistral or Command.

The Ethics of using AI Detection Tools 

The use of AI chatbots AI in academia raises significant ethical questions, beginning with reputational damage for both students and higher education institutions. For students, failing to disclose use of AI generated content and passing it off as their own can harm their ongoing education and future careers. Universities can similarly face accusations of enabling plagiarism, cheating, and failing to uphold academic integrity.

However, the use of AI detection tools without proper safeguards generates equally significant concerns around privacy and consent, especially regarding the contractual arrangements between universities and the tool provider. Such concerns include what happens to uploaded content, how it is stored, and consent if uploaded content is used in future training data.

Furthermore, as the previous section discussed, AI detection tools may misidentify human-written content as AI (false positives) or fail to detect AI-generated text (false negatives). Accuracy varies widely, with some tools better at detecting ChatGPT-3.5. Finally, they play a cat-and-mouse game with methods to evade detection – including software that specifically generates content designed to be undetectable by standard AI detection tools.45 

AI detection tools also contribute to broader debates around access, equity, and environmental impact. Students may be using AI to support translating and comprehension of coursework, especially if they are studying in an English-speaking country and are from a non-English speaking or other minoritized background with historically fewer opportunities for university education. Access issues also arise due to the commercial availability of LLMs; more well-off students may be able to pay for more sophisticated models and/or feed their work through multiple LLMs, meaning that chances of detection drop significantly.46

About the Cybersecurity Seminars Program

The Google.org Cybersecurity Seminars program seeks to offer more and better learning and job opportunities in cybersecurity to students at selected universities and other eligible higher education institutions in Europe, the Middle East and Africa. These students will get to put what they learn into practice, which will not only advance their skills but also positively affect the communities around them.

The program also addresses new risks from artificial intelligence (AI), providing students with an understanding of AI-based changes to the cyber threat landscape and helping them effectively integrate AI into practical cybersecurity measures.

Notes and References

  1.  Weber-Wulff et al., “Testing of Detection Tools for AI-Generated Text.”
  2.  Elkhatat, Elsaid, and Almeer, “Evaluating the Efficacy of AI Content Detection Tools in Differentiating between Human and AI-Generated Text.”
  3.  Although this report only focuses on academic uses,  we recognize there are use-cases and potential benefits for AI detection tools beyond academia, such as in the publishing industry, journalism or recruitment and HR
  4.  Copyleaks, “LMS Plagiarism Checker Plugin.”
  5.  Leon Furze, “AI Detection in Education is a Dead End”
  6.  Coffey, “Professors Cautious of Tools to Detect AI-Generated Writing.”
  7.  Ghaffary, “Universities Rethink Using AI Writing Detectors to Vet Students’ Work.”; Coley, “Guidance on AI Detection and Why We’re Disabling Turnitin’s AI Detector.”
  8.  Furze, “AI Detection in Education Is a Dead End.”
  9.  Cornell University, “Ethical AI for Teaching and Learning.”
  10.  MarGrowth, “UPass Review.”
  11.  Kar et al., “How Sensitive Are the Free AI-Detector Tools in Detecting AI-Generated Texts?”; Weber-Wulff et al., “Testing of Detection Tools for AI-Generated Text.”; Sadasivan et al., “Can AI-Generated Text Be Reliably Detected?”; Krishna et al., “Paraphrasing Evades Detectors of AI-Generated Text, but Retrieval Is an Effective Defense.
  12.  Fowler, “We Tested a New ChatGPT-Detector for Teachers. It Flagged an Innocent Student.”
  13.  Young, “AI Paraphrasing Detection.”
  14.  Edwards, “Why AI Writing Detectors Don’t Work.”
  15.  Coldewey, “OpenAI Scuttles AI-Written Text Detector over ‘Low Rate of Accuracy.’”; Coley, “Guidance on AI Detection and Why We’re Disabling Turnitin’s AI Detector.”
  16.  Compilatio, “AI Content Checker.”
  17.  Content At Scale, “The AI Detector Is a Real-Time AI Checker and ChatGPT Detector”
  18.  Copyleaks, “Copyleaks Launches AI Content Detector | Press Release.”
  19.  Copyleaks, “Pricing.”
  20.  Ivankov, “What Is Crossplag AI Detector?”
  21.  DetectGPT, “The AI Detector You Can Trust – DetectGPT.”
  22.  Dupli Checker, “Pricing & Plans.”
  23.  Winston AI, “Pricing.”
  24.  GPTZero, “Pricing.”
  25.  Scribbr, “Frequently Asked Questions: How Much Does Originality AI Cost?”
  26.  PlagiarismDetector.net, “Pricing & Plans | Plagiarismdetector.Net.”
  27.  Quillbot, “Quillbot: Pricing.”
  28.  API integration has separate pricing; Sapling AI, “Plans and Pricing | Sapling.”; Sapling AI, “API Pricing | Sapling.Ai Developer Documentation.”
  29.  SciSpace, “SciSpace Premium – Unlimited Access to AI Research Tools.”
  30.  Miller, “Turnitin Pricing in 2024.”
  31.  Undetectable AI, “Undetectable Pricing.”
  32.  Wordtune, “Wordtune Pricing and Plans | Choose Your Plan.”
  33.  Writer AI Studio, “Pricing.”
  34.  API access possible for universities; ZeroGPT, “ZeroGPT – Pricing.”
  35.  Popkov and Barrett, “AI vs Academia.”
  36.  Copyleaks, Turnitin, Originality.ai, Scribbr, Grammica, GPTZero, Crossplag, OpenAI, IvyPanda, GPT Radar, Content at Scale, Writer and Content Detector are able to classify human-generated content while ZeroGPT, SEO.ai, are ineffective in this regard.  Walters, “The Effectiveness of Software Designed to Detect AI-Generated Writing.”; Popkov and Barrett, “AI vs Academia.”
  37.  Liang et al., “GPT Detectors Are Biased against Non-Native English Writers.”
  38.  Popkov and Barrett, “AI vs Academia.”
  39.  Ibid.
  40.  Walters, “The Effectiveness of Software Designed to Detect AI-Generated Writing.”
  41.  Ryan, “ChatGPT Detectors Are Biased and Easy to Fool, Research Shows.”
  42.  Walters, “The Effectiveness of Software Designed to Detect AI-Generated Writing.”
  43.  Ibid.
  44.  Popkov and Barrett, “AI vs Academia.”
  45.  Weber-Wulff et al., “Testing of Detection Tools for AI-Generated Text.”
  46.  Furze, “AI Detection in Education Is a Dead End.”

Bibliography

Coffey, Lauren. “Professors Cautious of Tools to Detect AI-Generated Writing.” Inside Higher Ed, February 9, 2024. https://www.insidehighered.com/news/tech-innovation/artificial-intelligence/2024/02/09/professors-proceed-caution-using-ai.

Coldewey, Devin. “OpenAI Scuttles AI-Written Text Detector over ‘Low Rate of Accuracy.’” TechCrunch, July 25, 2023. https://techcrunch.com/2023/07/25/openai-scuttles-ai-written-text-detector-over-low-rate-of-accuracy/.

Coley, Michael. “Guidance on AI Detection and Why We’re Disabling Turnitin’s AI Detector.” Vanderbilt University, August 16, 2023. https://www.vanderbilt.edu/brightspace/2023/08/16/guidance-on-ai-detection-and-why-were-disabling-turnitins-ai-detector/.

Compilatio. “AI Content Checker: Detect AI with Compilatio.” Compilatio. Accessed August 27, 2024. https://www.compilatio.net/en/ai-detector-info.

Content At Scale. “The AI Detector Is a Real-Time AI Checker and ChatGPT Detector.” Content @ Scale. Accessed August 27, 2024. https://brandwell.ai/ai-content-detector/.

Copyleaks. “Copyleaks Launches AI Content Detector | Press Release.” Copyleaks. Accessed August 27, 2024. https://copyleaks.com/about-us/media/copyleaks-launches-ai-content-detector.

———. “LMS Plagiarism Checker Plugin.” Copyleaks. Accessed August 29, 2024. https://copyleaks.com/learning-management-systems.

———. “Pricing.” Copyleaks. Accessed August 19, 2024. https://copyleaks.com/pricing.

Cornell University. “Ethical AI for Teaching and Learning.” Center for Teaching Innovation. Accessed August 29, 2024. https://teaching.cornell.edu/generative-artificial-intelligence/ethical-ai-teaching-and-learning.

DetectGPT. “The AI Detector You Can Trust – DetectGPT.” Accessed August 27, 2024. https://detectgpt.com/#pricing.

Dupli Checker. “Pricing & Plans.” Duplichecker.com. Accessed August 27, 2024. https://www.duplichecker.com/pricing.

Edwards, Benj. “Why AI Writing Detectors Don’t Work.” Ars Technica, July 14, 2023. https://arstechnica.com/information-technology/2023/07/why-ai-detectors-think-the-us-constitution-was-written-by-ai/.

Elkhatat, Ahmed M., Khaled Elsaid, and Saeed Almeer. “Evaluating the Efficacy of AI Content Detection Tools in Differentiating between Human and AI-Generated Text.” International Journal for Educational Integrity 19, no. 1 (September 1, 2023): 17. https://doi.org/10.1007/s40979-023-00140-5.

Fowler, Geoffrey, A. “We Tested a New ChatGPT-Detector for Teachers. It Flagged an Innocent Student.” Tech in Your Life, April 3, 2023. https://www.washingtonpost.com/technology/2023/04/01/chatgpt-cheating-detection-turnitin/.

Furze, Leon. “AI Detection in Education Is a Dead End.” Leon Furze, April 8, 2024. https://leonfurze.com/2024/04/09/ai-detection-in-education-is-a-dead-end/.

Ghaffary, Shirin. “Universities Rethink Using AI Writing Detectors to Vet Students’ Work.” Bloomberg.Com, September 21, 2023. https://www.bloomberg.com/news/newsletters/2023-09-21/universities-rethink-using-ai-writing-detectors-to-vet-students-work.

GPTZero. “Pricing.” GPTZero. Accessed August 19, 2024. https://gptzero.me/.

Ivankov, Olga. “What Is Crossplag AI Detector? Pricing, Features and How to Use,” June 12, 2024. https://articlesbase.com/tech/emerging-technologies/artificial-intelligence/ai-tools-and-software/what-is-crossplag-ai-detector-pricing-features-and-how-to-use/.

Kar, Sujita Kumar, Teena Bansal, Sumit Modi, and Amit Singh. “How Sensitive Are the Free AI-Detector Tools in Detecting AI-Generated Texts? A Comparison of Popular AI-Detector Tools.” Indian Journal of Psychological Medicine, May 11, 2024, 02537176241247934. https://doi.org/10.1177/02537176241247934.

Krishna, Kalpesh, Yixiao Song, Marzena Karpinska, John Wieting, and Mohit Iyyer. “Paraphrasing Evades Detectors of AI-Generated Text, but Retrieval Is an Effective Defense,” 2023. https://doi.org/10.48550/ARXIV.2303.13408.

Liang, Weixin, Mert Yuksekgonul, Yining Mao, Eric Wu, and James Zou. “GPT Detectors Are Biased against Non-Native English Writers.” Patterns 4, no. 7 (2023): 100779. https://doi.org/10.1016/j.patter.2023.100779.

MarGrowth. “UPass Review: How Effective It Can Bypass AI Detection | HackerNoon,” July 31, 2024. https://hackernoon.com/upass-review-how-effective-it-can-bypass-ai-detection.

Miller, Nick. “Turnitin Pricing in 2024: What Does It Cost?” Medium (blog), July 11, 2024. https://medium.com/@nickmiller_writer/turnitin-pricing-in-2024-what-does-it-cost-80f552a7a20f.

PlagiarismDetector.net. “Pricing & Plans | Plagiarismdetector.Net.” PlagiarismDetector.net. Accessed August 27, 2024. https://plagiarismdetector.net/pricing.

Popkov, Andrey A., and Tyson S. Barrett. “AI vs Academia: Experimental Study on AI Text Detectors’ Accuracy in Behavioral Health Academic Writing.” Accountability in Research, March 22, 2024, 1–17. https://doi.org/10.1080/08989621.2024.2331757.

Quillbot. “Pricing.” Quillbot, n.d. https://quillbot.com/premium.

Ryan, Jackson. “ChatGPT Detectors Are Biased and Easy to Fool, Research Shows.” CNET, July 12, 2023. https://www.cnet.com/tech/services-and-software/chatgpt-detectors-are-biased-and-easy-to-fool-research-shows/.

Sadasivan, Vinu Sankar, Aounon Kumar, Sriram Balasubramanian, Wenxiao Wang, and Soheil Feizi. “Can AI-Generated Text Be Reliably Detected?,” 2023. https://doi.org/10.48550/ARXIV.2303.11156.

Sapling AI. “API Pricing | Sapling.Ai Developer Documentation.” Sapling AI. Accessed August 27, 2024. https://sapling.ai/docs/api/pricing/.

———. “Plans and Pricing | Sapling.” Sapling AI. Accessed August 27, 2024. https://sapling.ai/pricing.

SciSpace. “SciSpace Premium – Unlimited Access to AI Research Tools.” SciSpace. Accessed August 27, 2024. https://typeset.io/pricing.

Scribbr. “Frequently Asked Questions: How Much Does Originality AI Cost?” Scribbr. Accessed August 19, 2024. https://www.scribbr.com/frequently-asked-questions/how-much-does-originality-ai-cost/#:~:text=Originality.ai%20offers%20two%20pricing,10%20words%20for%20fact%20checking.

Undetectable AI. “Undetectable Pricing: Choose the Perfect Plan for Your Needs.” Undetectable AI. Accessed August 27, 2024. https://undetectable.ai/pricing.

Walters, William H. “The Effectiveness of Software Designed to Detect AI-Generated Writing: A Comparison of 16 AI Text Detectors.” Open Information Science 7, no. 1 (October 6, 2023): 20220158. https://doi.org/10.1515/opis-2022-0158.

Weber-Wulff, Debora, Alla Anohina-Naumeca, Sonja Bjelobaba, Tomáš Foltýnek, Jean Guerrero-Dib, Olumide Popoola, Petr Šigut, and Lorna Waddington. “Testing of Detection Tools for AI-Generated Text.” International Journal for Educational Integrity 19, no. 1 (December 25, 2023): 26. https://doi.org/10.1007/s40979-023-00146-z.

Winston AI. “Pricing.” Winston AI (blog). Accessed August 27, 2024. https://gowinston.ai/pricing/.

Wordtune. “Wordtune Pricing and Plans | Choose Your Plan.” Wordtune. Accessed August 27, 2024. https://www.wordtune.com/plans.

Writer AI Studio. “Pricing.” Writer AI Studio. Accessed August 27, 2024. https://dev.writer.com/home/pricing.

Young, Laura. “AI Paraphrasing Detection: Strengthening the Integrity of Academic Writing.” Turnitin, July 16, 2024. https://www.turnitin.com/blog/ai-paraphrasing-detection-strengthening-the-integrity-of-academic-writing.

ZeroGPT. “ZeroGPT – Pricing.” Accessed September 3, 2024. https://www.zerogpt.com/pricing.