AI tools have permeated all spheres of human life, ranging from apps for performing repetitive tasks to complex solutions for creative jobs. Artificial intelligence is becoming unbelievably human-like, making it increasingly challenging to detect AI-generated content. And AI detectors serve as the "first line of defense" when it comes to telling the former from the latter. While this article mostly concerns text, AI content detectors can also work with images, videos, and even sound, which is crucial for protection against deepfakes and other scams.
In this material, we'll introduce you to the concept of AI detection, provide examples of its application, and explain the basics of how AI detectors work. You'll also learn why false positives or false negatives may occur and how to bypass algorithms (and whether it's worth your while).
Introduction to AI detection
Let's imagine a very typical situation: you come across a notion or a fact, try to investigate it, and google it, finding a few articles. But how do you know whether they were written by a human expert or generated by an AI tool that gathered information from all available sources, no matter how reliable they are? You don't.
Or you see a beautiful photo and get interested in who the model is, only to learn that she doesn't exist because the image was also created by an artificial intelligence.
The truth is, nowadays you can never be 100% sure who the real content author is or whether you can trust it. While sometimes it doesn't really matter — who cares that an illustration is AI-generated if it serves its purpose — it would be disappointing to find out that the thought-provoking and insightful poem you've just read was written by a soulless program.
That's why AI detectors are gaining popularity not only for checking texts but also for visual content. The progress is irreversible and irrepressible, but responsible AI use is the key, and detectors are meant to help us with it.
How text detectors work
How do AI detectors work? They analyze written content using several parameters. Machine learning and natural language processing enable programs to "scan" the text and evaluate the following aspects.
Perplexity
In simple terms, this term means how predictable and insipid the text is. For example, if the sentence starts with "I like my coffee", we expect it to end with something like "hot, strong and black, not with "though I like my pillow more". Even though the second option is grammatically correct, it makes no sense (at least without extended context). People tend to make more creative language choices, while AI-generated content is usually simpler.
Burstiness
In other words, it's the text's consistency. Human writing may and sometimes even should be inconsistent: some sentences are long and complex, while others are short and simple. People can use advanced grammar patterns in one part and very basic sentence structure in another. AI-generated text usually follows one chosen pattern (unless instructed to do the opposite).
Perplexity and burstiness are key linguistic indicators, but not the only ones.
Watermarking
Yes, it is exactly what it sounds like: embedding certain "marks" (combinations of words and letters) into the text that seem natural but that AI content detectors recognize.
Vectors
Also known as embeddings, vectors are numerical representations of "human" words. Even if the words are full homonyms, like "wind" (noun) and "to wind" (verb), their vectors will be different. This technique helps AI detectors recognize AI-generated content better.
Manual detection
If you are a teacher revising a student's assignment, you can spot unusual word choice, overly advanced vocabulary, and other signs of AI-generated text. If you deal with an anonymous piece of writing, for example, on the internet, human-written text is more likely to have a personal touch, jokes (even not very funny ones), and an informal tone.
Writing a history analysis can also come in handy: if one of the edits contains large chunks of text, it looks suspicious (or the author just copied them from another document).
Examples of AI-generated text detectors:
How image detectors work
Not long ago, AI-generated images were easy to detect because of numerous "hallucinations" — wrong numbers of fingers, sometimes even arms and legs, unnatural postures, etc. We even compared several free generative AI tools a year ago, and almost all of them shared the same problems.
Digital noise analysis still helps, but it's less reliable than it used to be as AI generators keep evolving. One of the last frontiers, though, is embedded text that AI struggles to replicate accurately.
Last but not least: neural fingerprinting. Image and video detectors match pixel structures against known training data from AI image generators like Midjourney or DALL-E.
Examples of AI-generated image detectors:
How video and deepfake detectors work
Deepfakes are getting smarter: they can imitate real public figures, spread fake news, and cause panic. This technique is also used to breach security and obtain sensitive data from individuals and corporations.
How do AI detectors work in that case? They look for the following occurrences:
- Inconsistencies such as flickering, unnatural body movements, or shifting shadows.
- Facial expressions and physiological signals, like strange blinking.
- Asynchronous sound and lip movements.
- Hidden watermarks and metadata left by generative AI tools in their outputs.
Examples of AI detectors:
Are AI detectors reliable?
No, they are not. They may produce false positives and flag human-written content as AI-generated. This may be caused by several reasons.
First, non-native English speakers might use simple grammar and counterintuitive word choice. Then, some types of writing, such as documentation and manuals, are not very versatile or artistic. Last but not least, sentence length also matters: short phrases are more likely to be detected as AI writing.
False negatives are also quite common. AI outputs, when edited by a human writer, may "pass the test". As well as content enhanced by an app-humanizer.
For example, this very article was written by a very human being, I swear. Last time I had a medical checkup, it didn't reveal any cyborg parts. But the stealthwriter.ai's verdict was that 65% of my text is AI. On the other hand, detecting-ai.com was more merciful by seeing just 26.9% of my writing as AI-generated, and sapling.ai was also less harsh by labeling just 34.2% as fake.
That's actually all you need to know about AI detectors' reliability and their ability to recognize human writing. In my case, I guess, the output was heavily influenced by quotes and a spellchecker (guilty, Your Honor!).
Applications and uses
AI-generated content could open up new horizons of human creativity — but only if practiced with conscience.
AI detection comes in handy not just when you want to be sure that a random article on the Internet was written by a human. Here is a short, by no means exhaustive, list of possible applications for AI detectors.
- Academia. Sly students love writing their assignments with the help of AI.
- Science. Scientific articles should be created by real experts.
- Publishing and journalism. AI text generators lack integrity, emotional intensity, and soul.
- Social media. Platforms increasingly require AI labeling to maintain transparency. Also, followers are tired of shallow, samey content, so human-written text may become your advantage in this competition.
To sum up, AI detection is applicable in various spheres, starting from private ones (to verify the reliability of a source of information) and up to business and educational needs.
AI detectors vs. plagiarism checkers
Plagiarism checkers appeared much earlier than AI detectors. So far, they've been in use for a good dozen years or even more. Both help a lot, but they serve different aims.
AI detector tools just detect authorship, whether it's human or AI writing. The content itself might well be unique and authentic.
Plagiarism checkers verify whether the content is original, or if it's compiled or "borrowed" from other sources, with the help of the popular scholarly approach "copy and paste".
Both tools work best in combination: this way, you get a full text assessment.
Manual vs. algorithmic detection
Detecting AI writing manually relies on experience and personal knowledge of the writer. For example, a teacher or tutor who knows their students' styles can easily detect inconsistencies and the bizarre in someone's assignment, while an AI detector may prove ineffective, as basic natural language processing (NLP) detectors can be bypassed by AI humanizers.
On the other hand, in more generic cases, when dealing with a random text from an unknown source, AI detectors work more efficiently. Overall, a hybrid approach might give the best results.
Best practices and the future of AI detection
Here is a short list of our hints and tips to help you not get lost in the abundance of AI tools and their versatility. As it's a relatively new segment of AI usage (such an irony! AI tools help detect AI-generated content), you have to be especially cautious.
- Use multiple AI detection tools, including human review, to achieve the most reliable result.
- Don't forget to update your content detectors, as they keep evolving (along with AI writing tools).
- "Everybody lies". Keep in mind that false positives are not rare; don't rush to accuse an author based on a detector's output.
And above all, even if content is recognized as AI-generated, it's not the end of the world. There are lots of situations when it's totally acceptable, especially, if the text was manually edited and revised.
When artwork is invented by a machine, it loses its most important power: to help people connect.
In the future, the domain will undoubtedly evolve. The tools will become more accurate and reliable, there might also come legal regulations and restrictions. We foresee a few possible changes:
- Content credentials (C2PA): blockchain-backed metadata will prove human authorship.
- Invisible AI writing watermarks will become a default feature.
- Detectors will work with several types of content instead of only one (text, sound, video, or image).
- Public domains and resources will demand AI content labeling (or vice versa, a "human writing" mark).
Let's meet here in a year and see if we were right or not :)
AI detectors are apps, sites, or software that determine whether content was generated by a human writer or an artificial intelligence. These tools analyze sentence length and structure, look for metadata and hidden watermarks, and use other techniques.
Partially. They can sometimes detect deepfakes, but advanced AI models can bypass detectors quite effectively.
Yes, but they can make text sound "unnatural" and weird.
It depends heavily on the workplace's and educational body's policies, as there are no official laws and regulations yet.
They are not yet accurate. False positives can "reveal" a high percentage of AI content, even if the text was written manually and revised for grammar or spelling mistakes with the help of a tool (or even without any particular reason).