ChatGPT has just celebrated its second anniversary. Launched on November 30, 2022, the chatbot developed by OpenAI reached 8th place in the ranking of the world’s most visited websites in an impressive two years. In this short period, the use of generative AI for writing has become part of our daily lives — at home, in the workplace, and especially in academic settings. The latter has left educators and even students quite unsettled.
Recently, a friend shared in a conversation a complaint about a colleague from his postgraduate program who wrote lengthy and detailed lines about the European Union Directive on the Protection of Fundamental Rights in the Digital Environment for a group project. A legal framework that simply does not exist. As he discovered, it was a hallucination generated by AI, which his colleague had inadvertently failed to correct before submitting his portion of the work to the rest of the group.
This kind of incident sparks various ethical debates, but the realization that caught our attention the most is that students outsource their assignments to generative AI because it is impossible to determine whether a text was generated by AI or written by a human. There are indeed elements indicating a text may have been created by AI, such as GPT-isms and obvious hallucinations like the one from my friend’s inattentive colleague. However, there is currently no reliable tool in the market to certify the origin of a text inexorably.
Why is it so hard to detect generative AI texts?
Detecting AI-generated text with AI tools like GPTZero and ZeroGPT is challenging because generative models like GPT-4, Gemini and Claude are trained on massive datasets filled with diverse human-written texts. These models learn from millions of language examples, absorbing grammar, syntax, tone, and even cultural nuances, allowing them to generate text closely resembling human writing. With access to a vast range of language patterns, they can adapt to different styles – whether casual, formal, or even creative – making their output nearly indistinguishable from human-created content. As these models improve, their ability to mimic human language grows more sophisticated, blurring the line between machine and human authorship.
AI detection tools try to identify AI-generated text by examining specific metrics, notably “perplexity” and “burstiness.” Perplexity measures how predictable a piece of text is based on the AI model’s training; if the text closely resembles what the model has learned, it has low perplexity, indicating predictability. This is based on the idea that AI-generated text will lean toward common phrases and structured language. However, humans often use predictable phrases, especially in formal contexts, causing detection tools to struggle with false positives. Burstiness, on the other hand, examines variability in sentence length and structure. Human writing is generally thought to be more dynamic – alternating between short and complex sentences – whereas AI-generated text tends to be more uniform. This measure can signal AI generation, but it’s not foolproof, as humans can also write with consistency.
The difficulty in using perplexity and burstiness for reliable detection lies in the overlap between human and AI writing traits. Human authors can produce predictable or structured language, especially in certain writing styles. At the same time, advanced AI models increasingly incorporate variety and randomness, making their text less uniform and more human-like. The result is a high rate of false positives and negatives, which undermines the accuracy of detection tools. As generative AI evolves, it becomes harder for detection tools to keep pace, and without additional contextual or semantic analysis, distinguishing AI-generated text from human writing will remain a complex challenge.
Are the United States Constitution and the Bible AI-generated content? The flaws of the most popular AI detection tools
Numerous examples of AI tools failing to detect AI-generated writing have been reported over the internet. Still, the two most paradigmatic cases – and personally, our favorites – involve the United States Constitution and the Bible.
Both cases are cited and well explained by Benj Edwards, but briefly, in a test conducted in April 2023, GPTZero indicated that 96.21% of the content of the 1787 United States Constitution was created by AI. When tested on a part of the Book of Genesis from the Bible, the same tool produced a score of 88.2% for AI-generated content.
AI tools designed to detect AI-generated writing often struggle with texts like the US Constitution and the Bible, leading to false positives. This happens because these tools use metrics such as perplexity, which measures how predictable the text is. The language in the Constitution and the Bible is highly formal, repetitive, and structured, making it very predictable. Since generative AI models are trained on extensive datasets that include these well-known texts, they become good at mimicking this style. When an AI detection tool evaluates these texts, the low perplexity score (indicating high predictability) causes the tool to mistakenly classify them as AI-generated.
Is watermarking an effective solution?
Detecting AI-generated content is becoming increasingly difficult, and various tools are being developed to tackle this problem by verifying content provenance. Some approaches focus on cryptographic signatures or metadata tracking, which involve embedding secure information in the content itself to prove its origin. Other techniques include using external registries that log the creation details of digital content. Despite these innovations, watermarking remains the most popular and widely discussed solution for text verification. It stands out because it integrates directly with the AI model’s language generation process, making it an effective way to identify content created by AI.
The way watermarking works is by subtly adjusting how AI models select words or phrases when generating text. For example, an AI model can be programmed to use certain word combinations or sentence structures in a way that forms a detectable pattern. These adjustments are so minor that they don’t affect the overall flow or quality of the writing, but they create a recognizable “fingerprint” that can be scanned and confirmed later. This technology has the potential to make AI content more transparent and easier to identify, which is crucial for fields like education and media, where the authenticity of writing is important.
According to the Wall Street Journal, OpenAI has developed a watermarking tool that claims to be 99.9% accurate when identifying large sections of text, but the company has chosen not to release it yet. The company is concerned about potential unintended consequences, such as false accusations and the technology disproportionately affecting non-native English speakers. However, a significant factor in the delay is market pressure: a survey conducted by OpenAI revealed that nearly a third of ChatGPT users would be discouraged from using the tool if watermarking technology were implemented. This has created internal debate, as OpenAI must balance its responsibility to promote transparency and combat misuse with the need to maintain a satisfied user base and attract new users. As a result, the company has taken a cautious approach, weighing these competing interests before moving forward.
Even with its potential, watermarking is not foolproof. It can be bypassed with relatively simple methods, like translating the text to another language and back or making small edits that break the pattern. These weaknesses have raised questions about whether watermarking can serve as a long-term solution. Nevertheless, as AI-generated content becomes more widespread, the need for reliable ways to verify content provenance is urgent, and watermarking is still a significant step forward in addressing these challenges.
The future of AI text detection
Looking ahead, the future of AI governance will increasingly emphasize content provenance and the ethical use of generative models. In alignment with global best practices such as Singapore’s Model AI Governance Framework For Generative AI and the White House’s Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence, frameworks like the European Union’s AI Act (AIA), which came into force in August this year, are tackling this challenge head-on. Article 50(2) of the AIA requires that providers of AI systems label the outputs of their models “in a machine-readable format and detectable as artificially generated or manipulated.” The act goes further in Recital 133, suggesting various methods like watermarks, metadata, cryptographic techniques, and fingerprints to ensure content authenticity. These measures are meant to create a more transparent and accountable digital ecosystem.
And things may be bleaker from a technical perspective too, as it has been showed repeatedly by researchers, for example, by Sadasivan et al and Zhang et al, that AI detection tools carry a huge risk of flagging genuinely human-writen text by AI, and can even be subject to copy-paste attacks, word-level attacks, repeated paraphrasing attacks, etc. These attacks can lead to AI-generated text pass a wide range of detectors, including watermarking-based ones, zero-shot classifiers, and neural network-based detectors. However, this does not rob watermarks of their usage and importance, as some watermarking techniques, and even a simple longer watermark, are superior to others when faced with these attacks, and the search for a better method for generating watermarks continues.
Soon, deploying content provenance mechanisms like watermarking will become a regulatory necessity, as frameworks like the AIA address the ethical concerns and societal harms posed by generative AI. The potential for misinformation, academic dishonesty, and the manipulation of public opinion have made robust provenance tools essential. While current technologies like watermarking have their weaknesses and can be bypassed, these limitations should not be an excuse to delay action. As regulations and AI governance frameworks put pressure on the market, they will drive the evolution and improvement of these technologies. By enforcing requirements for labeling AI-generated content and ensuring transparency through advanced methods, these regulations aim to create a digital environment where AI advancements are balanced with accountability, continuous technological progress, and the protection of public trust.
BIOS
Nauani Benevides is a co-editor of the EMILDAI Blog. She is a Brazilian-qualified lawyer with extensive experience in litigation, corporate law, and legal research. Currently pursuing a European Master in Law, Data, and Artificial Intelligence (EMILDAI), she specializes in the intersection of law and technology and has a keen interest in Artificial Intelligence Governance. Nauani is also a certified Privacy and Data Protection Specialist (CIPP/E) and Artificial Intelligence Governance Professional (AIGP) member of the International Association of Privacy Professionals (IAPP).
Our other co-editor, Meem Arafat Manab, is a postgraduate student and researcher based intermittently in Dublin and Spain. Proudly representing EMILDAI’s second cohort, Meem’s professional interests range from digital entrepreneurship to equitable policymaking for the marginalized. They are currently working as a data scientist with Panacea Cooperatives, a research group based in Granada and Leon, while also collaborating with investigators in the UK, the US, France, and Sweden.