Machines can now talk to us, but in what sense is it language? ? Far from randomly repeating fragments of learned text, these new systems ofAI seem to construct an internal representation of what they are talking about.
“ We have lost the monopoly on language » (p. 11): it is through this thesis that Thibaut Giraud, known to the general public under his pseudonym YouTuber « Mr. Phi ”, opens a stimulating and ambitious essay devoted to “ major language models » (LLM). We thus call programs capable of automatically producing text in natural language after having been “ trained » on immense quantities of writing (that is to say initially programmed to predict the statistically most probable sequence of a series of characters). The successive versions of ChatGPT are the most famous example.
The book offers a philosophical introduction to these systems: how do they work in practice ? Do they understand what they are saying ? Can they be aligned with our values ? The tone is that of an investigation carried out through recent discoveries, with a constant concern for pedagogy and a criticism of overly hasty media discourse.
From programming to learning: a paradigm shift
As the author insists throughout the book, the LLM are part of a larger paradigm shift in artificial intelligence (p. 28-31, p. 177).
In the old approach, called “ symbolic “, the machine was explicitly programmed: engineers translated a solution already designed by humans into formal rules. Operation remained transparent, since each step corresponded to an identifiable instruction.
THE LLM relate, for their part, to automatic learning (machine learning), a method by which we do not directly program the solution, house designs a system capable of adjusting its parameters itself. The system learns to solve tasks through examples and training, so much so that its internal functioning becomes unintelligible in the sense that we can no longer interpret the calculation steps as a transparent and meaningful resolution operation for its human designers.
Concretely, these models are based on artificial neural networks: mathematical structures composed of elementary calculation units (“ neurons ) organized in layers. Each neuron receives numbers as input, performs a simple operation, and then outputs a result. The entire network thus performs a very complex mathematical function. Learning consists of gradually modifying the “ weight » (the relative importance) of connections between neurons to improve performance on a given task.
THE LLM today use a particular architecture called transform. An architecture refers to the way in which neurons are organized. The transformer is characterized by a so-called attention mechanism: the model evaluates the relative importance of the different words in a text, even if they are far from each other, in order to better predict what happens next.
These models are trained by self-supervised learning: they are provided with very large corpora of texts, and they learn to predict the word (or more precisely the tokenan elementary unit of text) following in a sequence. By making statistical adjustments to billions of parameters, they become capable of generating coherent texts.
This mode of operation has a major epistemological consequence: as the details of internal calculations become largely opaque, we cannot know in advance what a model will be able to do. A capacity is recognized when we discover a good “ prompt » (textual instruction provided to the model) which allows you to obtain a satisfactory result. But the set of possible prompts being practically infinite, it is very difficult, if not impossible, to definitively establish that a model is incapable of a given task (p. 198-200).
Hence the importance of establishing benchmarksstandardized tests making it possible to measure performance on specific tasks and to monitor progress over generations of new models (p. 188).
THE “ stochastic parrot »: an insufficient metaphor
Critics have sometimes described the LLM like simple “ stochastic parrots »: systems that probabilistically repeat fragments of learned texts. It is correct that a LLM calculates, at each step, the probability of words likely to follow. But, Giraud shows, this description misses the essential, namely the construction of an internal representation of reality by the system.
Research has revealed that in the middle layers of the network (i.e., between input and output), stable activation structures emerge. These “ patterns » can be interpreted as internal representations, that is to say numerical configurations which correspond to elements of the world or to aspects of a task. This is perfectly analogous to what happens in the human brain, when, as noted by neuroscientists, a series of neurons activate in a systematically correlated way with perceiving a given object or thinking about it, and therefore constitutes its internal representation.
Thus, in a model trained on chess games noted in standard language, certain artificial neurons see their activation vary depending on the presence of a piece on a given square. The researchers verified this by intervening on these neurons to forcefully modify their value, and by observing that this intervention alters the sequence of moves in a manner consistent with the absence of the coin. It’s as if the LLMfrom texts simply reporting series of moves, had succeeded in extracting a general representation of the game of chess, the board with the pieces, a relatively correct strategy and its rules (despite a few failures).
Work carried out by the company Anthropic on its Claude model has also identified patterns associated with particular themes or behaviors: if researchers thus artificially modify certain values, the program becomes sycophantic towards the interlocutor or responds by compulsively mentioning the Golden Gate Bridge (p. 345-350).
This research is crucial for “ alignment “, that is to say the set of techniques aimed at making the behavior of systems compatible with the intentions of their designers and human interests. Understanding the internal mechanisms could help limit tendencies toward manipulation or lies on the part of the machine.
Autonomy and anthropomorphism
The book examines disturbing phenomena observed on certain models (p. 367-399): apparent adoption of moral positions not explicitly programmed, sandbagging (strategic self-sabotage during evaluations), or “ alignment tampering » (compliance with our values and demands under supervision, divergence outside supervision).
Giraud proposes a typology of forms of autonomy (p. 364-365): autonomy of means (planning to achieve a goal), ends (choosing one’s objectives), and values (modifying one’s principles). THE LLM therefore seem to benefit to varying degrees from these three forms of autonomy.
However, one could object to the author that a conceptual difficulty appears. The author recalls that the LLM are simulators, not agents (p. 85-86, p. 168, p. 170): they produce responses based on input, without their own intention or capacity for autonomous action in the world. But then talk about “ autonomy in value ” may seem excessively anthropomorphic. The description of the situation would not be that a machine hides secret goals, but rather that a complex statistical system produces unpredictable results when conditions change. The difference is conceptually important, even if the practical effects may be similar (particularly in terms of their dangerousness).
The functionalist framework
On a philosophical level, Giraud adopts functionalism, a theory according to which a mental state, such as a belief or a desire, is defined by its functional role (the causal relationships it maintains with other states and with behavior), and not by the biological material which realizes it. If a computer system reproduces the same functional organization as a brain, it could in principle produce comparable mental states.
This framework leads to discussing the argument of “ Chinese room » proposed by John Searle: this famous thought experiment features an individual, hidden inside a box, who does not know Chinese, but has an instruction book that allows him to respond to the ideograms he receives from the outside with other relevant ideograms, giving people outside the box the impression that it understands Chinese. It is supposed to prove that a speaking machine does not understand language, but Giraud shows its limits at length: the heart of his criticism is that the argument makes an error as to the right level of description, by focusing on a cog (the human being) rather than the overall system (the room itself with the human and the rule book), to which “ it would be relevant to attribute the understanding of Chinese » (p. 298). It then examines whether the LLM present structural properties comparable to those described by neuroscience in certain models of consciousness (p. 326-333).
The author also criticizes the “ fallacy of the motte and the barnyard » (p. 300), which consists of displaying a strong thesis and then defending a weak, consensual, even unassailable version of it: certain defenders of human exceptionalism first assert that the machine will never be able to accomplish a specific task, then, faced with technical progress, fall back on the vague idea of an ineffable difference.
We can nevertheless regret that more radical alternatives – such as that of Michel Bitbol, insisting on the living and embodied anchoring of cognition – are not explored: these approaches underline in a different way than Searle the risk of semantic shifts when we speak of “ attention » or “ understanding » for purely computational processes, that is to say material processing of information by computing machines.
Have we lost the monopoly on language ?
The essay is an educational success: it explains complex technical concepts with clarity and shows the philosophical issues.
But the initial thesis – we would have “ lost the monopoly on language » – undoubtedly requires nuance. If speaking means producing coherent texts in an interaction indistinguishable from a conversation with a human interlocutor, then the LLM now compete with us. But if speaking implies an intention, a lived experience and an inscription in social practices, the answer is not as clear. Work like that of Éloïse Boisseau warns against a “ metonymic trap », which consists of attributing to the machine the intelligence that we project from the result produced, but which in reality comes from what we ourselves do with it.
Perhaps the work should encourage supporters of the difference between men and machines to renounce a dogmatic dualism or the temptation of the ineffable, for a rigorous examination of the conceptual distinctions between human speech and the language produced by machines.