How does AI detection work? Signals, scores and limits.

Every AI detector, free or institutional, runs on a small set of statistical ideas. Understand them and detector scores stop being magic numbers, in both directions: you will trust them more where they deserve it and less where they do not.

On this page

Perplexity: the predictability meter
Burstiness: the rhythm meter
Classifiers: the trained judges
Watermarks: the planted signal
The four methods, compared
Why every score is an estimate
What the limits mean in practice
Sources and further reading

Perplexity: the predictability meter

Language models assign probabilities to the next word. Perplexity measures how surprised a model is by a text: if every next word is exactly what the model would have guessed, perplexity is low; if the text keeps zigging where the model expected zag, perplexity is high. AI generated text is, almost by definition, text a model found maximally probable, so it scores eerily low perplexity. Human writing is messier, full of odd word choices and tangents, and scores higher. Detectors exploit the gap. The catch: plenty of human writing is also predictable, contracts, lab reports, anything written to a template, and it inherits low perplexity and false flags along with it.

Burstiness: the rhythm meter

Burstiness measures variance, mostly in sentence length and structure. People write in bursts: a 40-word sentence, then a 5-word one, a question, a fragment. Models distribute sentence lengths tightly around their average, producing the metronome cadence you can hear by reading AI text aloud. Low burstiness plus low perplexity is the classic machine signature, which is also why the most effective manual humanizing technique is simply breaking the rhythm.

Classifiers: the trained judges

Modern commercial detectors, Turnitin, GPTZero and the rest, go beyond raw statistics: they train neural classifiers on large corpora of labelled human and AI text. The classifier learns thousands of subtle features and outputs a probability, which the product rounds into a score and a verdict. Classifiers beat raw perplexity on accuracy, but they inherit their training data's blind spots: text from newer models they have not seen, hybrid human-AI documents, and writing styles underrepresented in training, which is the documented mechanism behind elevated false positives for non-native English writers.

Watermarks: the planted signal

Watermarking flips the problem: instead of detecting AI text after the fact, the generating model subtly biases its word choices according to a secret key, so a verifier holding the key can later test for the pattern. The research is real, Google DeepMind has deployed SynthID for text in some contexts, but there is no universal watermark across AI models in 2026, no standard, and a fundamental fragility: rewriting the text, by hand or by humanizer, destroys the statistical pattern. Watermarks may eventually matter for provenance of unedited model output; they do not rescue detection of edited text.

The four methods, compared

Method	Measures	Strength	Blind spot
Perplexity	Word predictability	Cheap, model-free-ish	Formal human prose scores low too
Burstiness	Rhythm variance	Matches human intuition	Trained writers vary less
Classifier	Learned patterns	Best overall accuracy	New models, hybrids, dialects
Watermark	Planted signal	Cryptographic when present	No standard; dies on rewrite

Notice what the table implies when read column by column: each method covers a different slice, every blind spot overlaps real human writing somewhere, and the strongest method, the classifier, is also the one that ages fastest. Commercial products blend several methods precisely because no single one survives contact with the real distribution of texts.

Why every score is an estimate

Stack the methods and you still get probability, not proof. The same text scores differently across detectors because each was trained differently and thresholds are set differently. Short texts give all methods too little signal, which is why our detector requires 120 characters and why sentence-level highlights everywhere are less reliable than document scores. Edited and hybrid texts blur every signal at once. A detector score is a smoke alarm: worth listening to, wrong often enough that nobody should be convicted by it alone. That is exactly how we present scores on this site, inconclusive band included, and the humanizer plus re-check loop exists so you can watch the statistics move rather than take anyone's word for them.

What the limits mean in practice

For writers, the practical reading is calm: low scores on your own work are normal, occasional weird scores are normal too, and a paper trail of drafts beats any score in a dispute. Write in tools that keep history and the detector era mostly cannot touch you.

For teachers and institutions, the evidence supports using detectors as one signal among several, never as an autonomous judge. A flag should open a conversation about process, drafts, notes, the student's explanation, because the documented false positive bias means score-only enforcement punishes exactly the wrong students. Several institutions have disabled AI detection entirely on these grounds; the ones that keep it owe their students the same disclaimer the technology owes everyone.

For tool builders, us included, the obligation is honesty about the middle of the distribution. Detection works at the extremes and blurs in between, so an interface that shows three confident decimal places is performing certainty the mathematics does not contain. That is why our interface shows a plain integer, three bands and a disclaimer. The number is real; the theatre is optional.

Keep that frame and the whole detection debate becomes manageable: not a war between cheaters and catchers, but an ordinary measurement problem with ordinary measurement ethics. Measure carefully, report uncertainty, never punish on a single reading. Every field that measures anything learned these rules long ago; AI text detection is just the newest field that has to.

Sources and further reading

Keep reading

ToolSee the signals scored live DetectionGPTZero under the lens DetectionTurnitin specifics TechniqueBeat the metronome by hand

Frequently asked

What signals do AI detectors look at?

Mostly predictability (perplexity), uniformity of sentence rhythm (burstiness), and patterns learned by classifiers trained on known AI and human text. Some systems also look for watermarks.

Why do detectors disagree with each other?

Different training data, different thresholds, different models. The same text can score 20 on one detector and 80 on another, which alone tells you how much uncertainty is involved.

Are AI detection watermarks real?

Watermarking exists in research and in some deployed systems, but there is no universal watermark across AI models in 2026, and plain rewriting typically destroys statistical watermarks.

Can detectors identify which model wrote a text?

Not reliably. Some research classifiers attempt model attribution, but commercial detectors output a generic AI likelihood. Model-specific fingerprints fade fast as models converge on similar training recipes.

Do detectors improve as models improve?

They improve at detecting yesterday’s models. Each new model generation writes with more human-like variance, which erodes the gap detectors rely on. The long-term trend favours provenance solutions over statistical detection.