Do large language models believe?

Abstract

In this paper, I ask whether large language models (LLMs) have beliefs. The question is important, because the capacity to believe is widely assumed to be a prerequisite for rationality. The question is also difficult. Not only because there is much yet to be learned about the capacities of LLMs, and about the mechanisms that underlie those capacities, but also because the basic nature of belief is subject to deep philosophical dispute. The ambitions of the paper are accordingly modest. I set out to (i) distinguish two opposing views about the nature of belief, and (ii) work out what each view implies about the prospects for belief in LLMs.

Representationalist theories of belief say that what makes it the case that an agent has beliefs is a set of facts about the structure of that agent’s mental states (Quilty-Dunn and Mandelbaum, 2018). Superficialist theories of belief, in contrast, say that what makes it the case that an agent has beliefs is a set of facts about that agent’s actual and counterfactual patterns of behavior (Schwitzgebel, 2023). You might think that, since LLMs can mimic human verbal behavior, but are unlike humans on the inside, superficialists should say that LLMs have beliefs, and representationalists should say they don’t. I argue that this assignment of views is almost exactly backwards.

The paper offers two arguments that representationalists ought to ascribe belief to LLMs. The first is based on the view that “any test that is diagnostic of ToM (theory of mind) will be diagnostic of the capacity for belief” (Porot & Mandelbaum, forthcoming), along with recent evidence that LLMs do have a theory of mind. The second argument, inspired by Hacking’s (1983) experimental realism, is based on interpretability work that purports to show not only that belief-like representations can be decoded from activation patterns inside LLMs (Burns et al., 2022), but that it is possible to systematically alter the LLMs’ belief-like assertions by intervening on those activation patterns (Li et al. 2023).

The paper also offers two arguments that superficialists ought not to ascribe belief to LLMs. The first argument is a reductio. If LLMs do have beliefs, we can look at any given assertion, and ask whether the LLM really believes it. But which empirical facts might ground such a distinction? In the human case, the distinction is grounded in the possibility of observing logical and practical inconsistencies between (i) the assertion in question and (ii) an extended sample of both verbal and non-verbal behavior. The crux of the argument is that in the case of LLMs, no such extended inconsistency search is possible, because the mechanism that allows LLMs to mimic rational consistency is limited to the length of the so-called context window, which is itself limited for the non-trivial reason that, in transformer architectures, computational demand scales quadratically with context length (Vaswani et al., 2017). The second argument says that superficialists ought to accept a hitherto unnoticed principle according to which the justification for belief ascription depends on a kind of proportionality between the complexity of beliefs and the complexity of desires. Since LLMs have no substantive desires, they violate this principle.

The paper concludes with a warning. Although we should be open evidence that LLMs have beliefs, we should not be open to the view that LLMs can take responsibility for the beliefs they hold.

Date
Nov 9, 2023
Location
Erlangen, Germany