| MIND · MATTER · MEANING | No. 36 · May 2026 |
The Word “Hallucination” Was Already Taken
A system with no inside can’t hallucinate — only drift.
| An essay | mindmatterandmeaning.com |
A chatbot hands you a footnote. It cites a paper that does not exist. The author it names never wrote anything close to the title. The journal volume runs ten issues short, and the page numbers point into empty air. You copy the citation into a search engine, find nothing, and go back to the chat window with the now-standard complaint: it hallucinated again. Everyone in this little drama uses the word as if it carried no philosophical freight at all — as if the engineers had flipped through the dictionary, hunting for something punchy, and simply landed on the right term.
They did not land on the right term. They landed on a word that already had a job, and the job mattered.
In philosophy, hallucination names something quite specific. A person has an experience that seems, from the inside, to present a real object — a pink rat on the kitchen counter, a friend at the foot of the bed — when no such object is there. The experience happens. The object does not. And the whole difficulty lives in that word seems. The hallucinator looks out at what feels like a world and meets no resistance in it; nothing on the inside of the experience whispers that it has failed.1 Philosophers disagree, sometimes fiercely, about what that shared appearance comes to.2 But every account worth having agrees on one thing: a hallucination happens to someone. It needs a subject who seems to see.
Now look at what the engineer means. A model trained on text produces a string that mentions a paper it has never encountered. No seeming is involved. Nothing inside the system scans a row of citations and concludes, falsely, that one of them is real. The model has no vantage point from which the false output looks like a world. It does not have a world. It has a distribution over tokens, shaped by your prompt, and a sampling step that picked one path through that distribution rather than another. The output strays from the truth because nothing in its training rewarded tracking the truth this finely. The phenomenon is real, and it deserves a name. Hallucination simply names the wrong shape.
Why did the word stick? Partly because it sounds clinical and forgiving at the same time. It pathologizes the model gently, as though it had caught a passing fever. The alternative is to say plainly what every text-only system does: it strings together plausible continuations whether or not those continuations track anything. And partly the word stuck because it smuggles in a familiar picture — a mind, turned inward, deceived. The old Cartesian theater reopens for one more show, this time staged inside a server rack. By sheer suggestion, the model becomes a tiny subject, now and then misled. Once that picture takes hold, the question how do we stop the model from hallucinating? sounds answerable, the way medicating a patient sounds answerable. The harder question, the one the picture hides, never even gets asked.
Here is that harder question. To misrepresent anything, a system has to be tied to the world tightly enough that something fixes when it succeeds and when it fails. The teleosemantic tradition locates that tie in a system’s biological or designed function: a state misrepresents when it fires outside the conditions it was built to track.3 Searle gets to a kindred conclusion by another road — genuine meaning shows up only where formal symbol-shuffling connects to real causal, embodied dealings with the world.4 Either way, misrepresentation is an achievement. A thing has to first be the kind of thing that can represent. Only then can it, on a given occasion, get something wrong.
A text-only language model has no such footing to lose. Its outputs ride on statistical patterns combed out of a corpus, and the corpus stands in for the world only in the thin sense that the humans who wrote it were writing about the world. Nothing in the model’s loop checks its outputs against any state of affairs out there. The very idea of the model getting it wrong imports a yardstick the model cannot hold. We hold it for it. We are the ones who notice the missing journal issue. The model notices nothing.
Some philosophers push back on this hard line, and they deserve a hearing. Marek Havlík argues for what he calls semantic fragmentism: the claim that language models do achieve real meaning, not everywhere, but within bounded patches of language where their training is dense and coherent.5 The view is trying to honor something obvious — the gulf between Eliza shuffling canned phrases and a modern model translating, summarizing, and holding a dozen constraints in the air at once. Fair enough. But fragmentism still owes us an account of what fixes meaning inside those patches. If the answer is use within a corpus, it has only moved the form/meaning gap somewhere harder to see. If the answer is grounding in the world, it has conceded the whole point.6
None of this makes the engineering problem vanish once we take the philosophical word back. The problem stays. It just gets more honest. What the model does, when it fabricates a citation, is closer to confabulation — a word we already use for fluent narration produced without access to the facts the narration claims to report. Or, more plainly still: drift from a standard the system cannot detect. Neither phrase will ever move a product launch. Both have the modest merit of being true.
The cost of the borrowed word shows up in the questions the field lets itself ask. Ask whether a model can be made to stop hallucinating, and you have quietly assumed it has a grip on the world that slipped — and that the right tweak will tighten it. Ask instead what a system would need before its outputs counted as representations at all, and you walk straight into the harder country: sensors, a body, a causal history, the long apprenticeship through which a creature comes to mean cat by the word “cat.” Better questions tend to make better engineering. They also, as it happens, make better metaphysics.
No one is giving the word back. The AI industry does not borrow vocabulary and then return it, and there is something almost endearing about the theft — a field moving so fast it will cheerfully lift a clinical term from the discipline next door and call the lifting naming. But look at what the borrowing does. It plants, at the dead center of the most consequential technology story of the decade, a word that describes an inner theater inside a system that has no inside. The pretense earns its keep. It quietly props up the very confusion this whole project has been working, patiently, to take apart. A model that hallucinates sounds like a mind on the mend. A model that drifts from facts it cannot detect sounds like exactly what it is. And once we hear it as what it is, we can finally ask the real question: what would have to be added before a system could be capable of getting anything wrong at all?
Notes
- Tim Crane, “Is There a Perceptual Relation?”, in Perceptual Experience, ed. Tamar Szabó Gendler and John Hawthorne (Oxford: Oxford University Press, 2006), 126–146; “Introspection, Intentionality, and the Transparency of Experience,” Philosophical Topics 28 (2000): 49–67; and “The Problem of Perception,” Stanford Encyclopedia of Philosophy (rev. 2021, with Craig French). On Crane’s intentionalist treatment, hallucination is a representational state whose content fails to match the world; the phenomenal character of the state arises from how it represents, not from any inner object the subject is alleged to inspect. This sits within strong representationalism and explains why a hallucination seems to present a worldly object — it represents one, just inaccurately. The treatment is congenial to the present essay’s claim that hallucination is a content-failure of a world-directed state, and it is the account the main text of Chapter 4 develops and Ch04.2 (“The Pain in the Toe That Isn’t There”) applies to phantom limb. The point of citing it here is to fix the philosophical meaning of hallucination before the engineering metaphor co-opts the word. ↩
- M.G.F. Martin, “The Transparency of Experience,” Mind and Language 17 (2002): 376–425, and “On Being Alienated,” in Perceptual Experience, ed. Tamar Szabó Gendler and John Hawthorne (Oxford: Oxford University Press, 2006), 354–410. Martin’s disjunctivism denies the common factor assumption — that veridical perception and a subjectively indistinguishable hallucination share a metaphysically substantive mental state. On his view a hallucination is characterized only negatively, as a state indistinguishable through reflection from a veridical perception of a particular kind. The book sits closer to Crane’s intentionalist account than to Martin’s negative epistemic disjunctivism (see Ch. 3 on the argument from illusion), which is exactly the disagreement the main text gestures at with “sometimes fiercely.” Both camps nonetheless converge on the point that does the work here: the philosophical use of hallucination requires a subject who seems to encounter a world. ↩
- Ruth Garrett Millikan, Language, Thought, and Other Biological Categories (Cambridge, MA: MIT Press, 1984), chs. 1–2; and Karen Neander, A Mark of the Mental: In Defense of Informational Teleosemantics (Cambridge, MA: MIT Press, 2017). Neander’s “informational teleosemantics” extends Millikan’s framework by tying representational content to the conditions a system is functionally adapted to detect — its informational functions, which carry what she calls “normative aboutness” — rather than to the conditions it merely happens to correlate with. The misrepresentation case then comes out clean: a state misrepresents when it occurs outside the conditions its function was selected to track. The frog’s bug-detector firing at a passing pellet (Chapter 6’s example) is the canonical illustration. Crucially for the present argument, neither Millikan nor Neander offers any route by which a text-only LLM could misrepresent, since on neither account can a representational achievement be inherited without the selection history that grounds it (see Ch06.1 for the developed argument). ↩
- John Searle, “Minds, Brains, and Programs,” Behavioral and Brain Sciences 3 (1980): 417–424; and “Is the Brain a Digital Computer?”, Proceedings and Addresses of the American Philosophical Association 64 (1990): 21–37. Searle’s two-step argument — first that syntax does not yield semantics (the Chinese Room), then that computation is itself observer-relative — is the spine of Chapter 9’s case against treating LLMs as understanders. (A guarded note for the careful reader: the Chinese Room alone does not establish the strong, fully general claim that no syntactic process could ever yield semantics; the book leans on the observer-relativity argument, not the thought experiment in isolation, to carry that weight.) The point relevant here is narrower. Searle’s distinction between systems with genuine, world-grounded semantic content and systems that merely emit semantic-shaped tokens makes the engineering term hallucination a category mistake: a system without content to begin with has no content to misrepresent. What it has is output drift relative to an external standard. ↩
- Marek Havlík, “Meaning and Understanding in Large Language Models,” Synthese 204 (2024). Havlík’s “semantic fragmentism” (developed in his §3.7) is the more sympathetic edge of the contemporary LLM-meaning debate: rather than denying LLMs any relation to content, he argues that they achieve bounded semantic competence within domains where their training distribution is dense and coherent. The book grants the empirical observation — modern models really do show competence gradations across domains — but resists the inference that domain-bounded statistical coherence amounts to genuine semantic content. The fragmentist position trades on the same conflation Bender and Koller diagnose (next note): the slide from “the form is right” to “the meaning is there.” A useful contrast piece is Jumbly Grindrod, “Large Language Models and Linguistic Intentionality,” Synthese 204:71 (2024), discussed at length in Ch06.1. ↩
- Emily M. Bender and Alexander Koller, “Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (2020), 5185–5198. Their “octopus test” is the cleanest engineering-side statement of the syntax/semantics gap for large language models: a hyperintelligent octopus that taps an undersea cable carrying two islanders’ chatter could learn every statistical regularity in their messages without ever encountering a coconut or a predator, and would still produce fluent replies it does not understand — until the day real help is needed and the fluency fails. The argument recasts Searle’s Chinese Room in distributional-semantics terms and has been much discussed in the NLP literature without being much heeded. The form/meaning gap they diagnose is the same gap this essay turns on: textual coherence is not world-grounded representation, and a system that lacks the latter cannot, strictly, misrepresent in the way hallucinate implies. ↩