How Uncanny Atlas works

It is difficult to know what is 'real' and what is generated by AI on the internet. People generally point to specific things when declaring “that's AI” - The hands, the light, a plastic sheen they can't quite name. On r/isthisAI and r/RealOrAI, thousands of people argue about exactly that, in the comments.

This tool is an attempt to turn that pile of arguments - currently 912,187 comments - into a map of the indicators people actually use. This page explains how, from first principles. No prior knowledge of the project is assumed; just scroll, and play with the figures.

The whole job sounds simple: find every comment that names an indicator, work out which indicator, and count them. The trouble is hiding in the middle. Let's build up to it.

1 · A thousand ways to say one thing

People don't speak in tidy labels. The single idea “the hands are wrong” shows up as six fingers melty hands too many knuckles fused fingers - and a hundred more.

The obvious approach is to search for matching words. But meaning and words come apart fast. Here are two comments that plainly mean the same thing. Which words do they actually share?

thehandslookmelty

means the same as ↓

weirdfusedfingers

Shared meaningful words: 0 - keyword matching finds nothing in common.

Two paraphrases of the same indicator, broken into words. Highlighted = shared. Almost always there's no overlap - so counting words would split one indicator into dozens, or miss it entirely.

We need a way for the computer to see meaning, not spelling.

2 · A model that pulls out the indicators

First, someone (or something) has to read a comment and say what indicator it names. This project used a small language model (gemma3:4b, running locally). It read one comment and returned the indicators as short phrases:

“lol no way this is real, look at her hands - she's got like six fingers and the shadows are all over the place”

extracts →

six fingers wrong shadows

The model reads the text and lists the visual indicators it cited.

This works, but it has two limits. It's slow and costly - one comment per call - so the project only runs on a sample of a few thousand.

How we choose that sample matters. We don't read random comments - only ones likely to be discussing authenticity at all, picked by a deliberately broad keyword net: AI real fake generated obvious look. These are topical words on purpose - not visual-indicator words. Filtering the sample for visual-indicator words (finger, shadow, lighting) would be circular: you cannot discover which indicators people use if you only read comments that already mention the indicators you guessed. So the net catches “is this real or AI?” talk, and lets the model find whatever indicator is actually there (and occasionally a couple that aren't).

So after this step we have a few thousand comments, each with real indicator-phrases - but those phrases are still just words, with the paraphrase problem from §1. Time to fix meaning.

3 · Turning meaning into a place on a map

Here's the key trick. Imagine a giant map where every phrase gets an address, and phrases that mean similar things live close together - same street - while unrelated ones live in different cities. That address (a long list of numbers; really 768 of them) is called an embedding. A second little AI - an embedding model (nomic-embed-text, also running locally) - learned to place text on this map by reading a mountain of writing.

Below is a flattened, 2-D sketch of such a map for some real comment-phrases. Notice the clumps: all the hand complaints sit together, all the shadow ones together - even though they share no words. The big ◆ markers are our known indicators; the small dots are individual comments.

Drag the crosshair anywhere (or hit a button to jump it to an indicator). Everything inside the circle is “close enough to count as the same thing.” Drag the slider to change how close close has to be.

jump to:

Match threshold: 0.73 balanced

7 comments are within range of the crosshair right now: six fingersmangled handsextra fingerfused fingersweird fingerstoo many knucklesmelted text. Park it on an indicator and watch it gather that indicator's paraphrases.

To check if two phrases mean the same thing, the computer just measures the distance between their addresses. Close = same idea. That distance threshold is the 0.73 you saw on the slider. Crank it up and you demand near-identical meaning (you miss loose paraphrases); loosen it and you sweep in more, including the occasional wrong one.

4 · Letting the indicators find their own comments

Now the payoff. The first model only saw a sample. But every comment can be placed on the map cheaply. So for each known indicator (the ◆ markers), we simply ask: which comment-dots are nearby? - and tag them all, no re-reading required. Flip the figure above to “Tag near every indicator.”

This is semantic expansion. A comment that says “her fingers are all fused together” never typed the word “hands,” but it lives right next door to the ◆ hands indicator - so it gets counted. That's how a tiny sample grows into broad coverage, and why the counts reflect what people actually said rather than just the few comments the model had time for.

One guard rail: expansion only looks at comments long enough to be describing something - at least 20 characters, and not a bot. A lone 👍 or lol sits near plenty of indicators on the map but isn't evidence of anything, so it's skipped. Without that floor a vague seed like “AI voice” would hoover up thousands of one-word reactions and drown out the real signal.

Where the ◆ indicators come from: seeds

Those ◆ markers have a name: seeds. A seed is a known indicator that semantic expansion reaches out from - each one gathers the comment-dots in its neighbourhood. Seeds aren't hand-listed up front: the pipeline builds them automatically by taking the ~200 most-frequently extracted phrases and sorting each into a category - that's the taxonomy. A curator (me, in this case) can also seed a phrase the model never surfaced, or un-seed a bad one. Toggle the seeds below and watch coverage shrink and grow - the grey non-indicator cluster is never seeded, so it stays dark:

seeds:

25 comments gathered. Only seeded ◆ indicators (filled) reach out; an un-seeded one (hollow) goes dark, and so does its neighbourhood. More seeds → more coverage - which is why the taxonomy, and the curator's seed edits, decide how much the map can see. (We'll tidy these scattered phrasings into merged groups next.)

5 · From a teaspoon to the whole ocean

The scale gap is what makes this worthwhile. Reading a comment with the language model is slow, so only about 15,990 were ever read that way. Placing a comment on the map is cheap, so every comment can get an address - all 912,187 of them. The bar shows how much of the corpus is mapped:

mapped: 912,187 / 912,187 (100.0%) read by the language model: ≈15,990 (the thin mark)

The thin mark is the sliver a human-speed reader could cover; the fill is what cheap embeddings reach. Because every comment has a map address, semantic expansion can draw on the whole corpus instead of just the sample.

6 · Tidying the map

Two messes remain, and both are human-in-the-loop.

Merging synonyms into one canonical group

First, the same indicator is scattered across synonyms, so its count is split several ways. We merge the variants into one canonical indicator and re-point every comment to it. They collapse into a single entry everywhere on the Explore side - Top indicators, Inspect, and Semantic matches - and their counts combine, so the tally reflects the real concept instead of splitting across spellings. (Each raw phrase is still curated on its own - and a merged group is itself just a tidied seed.) Merging is done on the Merge page. Press the button:

Five scattered phrasings - wrong hands, six fingers, mangled fingers… - collapse into a single canonical indicator. Now the count reflects the real concept instead of splitting across spellings.

Second, the model looking for indicators sometimes builds a fake indicator - a vague verdict like “looks obviously fake,” which isn't a visual indicator at all. Left on the map, semantic expansion would drag hundreds of reaction comments to it. So a curator (again, in this case me) marks it Noise - the equivalent of pulling that house off the map. Expansion then skips it forever, and (because the decision is written into the master map) it stays gone even after future runs. That grey “not an indicator” blob in the big figure is exactly what Noise looks like: present, but never matched against comments.

What you end up with

Put it together - read a sample, place everything on the map of meaning, let indicators gather their own comments, then manually merge and de-noise - and the pile of 912,187 arguments becomes a perspective on which indicators people actually rely on.

In this dataset, from my perspective, the most-cited indicator is Hands with 2,637 associated comments.

See the live results → Inspect a single indicator → Run it yourself (Run book) →