Uncanny Atlas

Pipeline status

How the corpus narrows from every collected comment down to the ones that name a specific visual indicator — and how those indicators were found. A read-only snapshot of extraction, embedding, and expansion.

Coverage funnel

Collected 912,187 · 100.0% the whole corpus

Embedded 912,187 · 100.0% of collected

Candidate comments 473,708 · 51.9% of collected

Read by the model 15,990 · 3.4% of candidates

Flagged as citing an indicator 29,479 · 3.8% of eligible, 3.2% of all

Citing a curated tell 18,658 · 2.4% of eligible

Candidate keyword filter

The keyword pre-filter that defines the candidate comments stage above (the pool the LLM samples from). A comment must mention at least one of these (plus the ≥20-char and non-bot checks). Semantic expansion uses a broader gate — the same ≥20-char / non-bot checks but no keyword requirement, so it can reach comments that describe a tell without these words: 777,779 eligible comments. The ≥20-char floor is what stops it matching one-word & emoji reactions (a generic seed like "AI voice" would otherwise vacuum up thousands).

AIrealfakegeneratedobviouslook

How the indicators were found

18,474

LLM-extracted

24,042

Semantic matches

Keyword expansion

Taxonomy & curation

189

Taxonomy indicators

189

Embedded indicators

671

Indicator aliases

Pending re-expansion

Recent extraction runs

Batch	Model	Started	Completed	Sample	Processed
d2cb1034	gemma3:4b	2026-06-04T09:15:28	2026-06-04T09:49:42	8000	7995
215c54ba	gemma3:4b	2026-06-03T11:53:42	2026-06-03T12:28:35	8000	7995

Pipeline status

Coverage funnel i

Candidate keyword filter i

How the indicators were found i

Taxonomy & curation

Recent extraction runs i

Coverage funnel

Candidate keyword filter

How the indicators were found

Recent extraction runs