A recent paper by Mark Coeckelbergh and David Gunkel in AI & Society has got me thinking. Since I know that David will now immediately think his work is done — getting us thinking is his goal — let me stress from the outset that it has mainly got me thinking that they’re wrong. Since their aim is “deconstructive”, however, telling them they’re “wrong” is not so straightforward. Properly speaking, in order to be wrong, there have to be facts of the matter and you have to say something about those facts, and deconstructive writing often resists being read that way. Still, I think I’ve found a sense in which Mark and David are simply and plainly wrong and I invite them to consider it here.
Although it may not be their explicit thesis, I take the underlying message of their paper to be that large language models constitute a “fundamental challenge to long-standing metaphysical, epistemological, and axiological assumptions,” which they sort, roughly, under the rubric of “logocentrism”, and which therefore also gives them an “opportunity” to remind us of the not-quite-so-long-standing but nonetheless familiar “postmodern” critique (or deconstruction) of these assumptions as found in the work of Barthes, Foucault, and Derrida. Specifically, they put the challenge of generative AI as follows: “these algorithms write without speaking, i.e. without having access to (the) logos and without a living voice.” This is the statement that I think is wrong. But I want to make clear that, although I’m not myself ashamed of my logocentrism, I don’t just think it is wrong on those “long-standing metaphysical assumptions” they propose to deconstruct. I want to offer a critique on Mark and David’s own terms.
I should say that I’ve tried this sort of thing before in my conversations with David about robot rights, with rather limited results. I disagree with him that we can “face” a robot as an “other” in Levinas’ sense; and I don’t think they provide the correlative “incidents” of “rights” in Hohfeld’s sense. As far as I can tell, he has so far been unmoved by my arguments, which are based both on my understanding of how robots work and my reading of Levinas and Hohfeld. Past failures notwithstanding, I can’t think of a better way to do critique than that, and I’m going to offer something similar here.
Mark and David pass somewhat lightly over how language models work, encouraging us to take them more or less at face-value (or to abolish any distinction between face-value and “real” value). But we have to remember that there are many ways to imagine a machine putting words together that we would not consider writing. In a previous post, I suggested that LLMs are not, in fact, “artificial intelligences”; they are merely “electronic articulators”; and I asked us to consider the following example of a (non-)”writing” machine:
Imagine you have three bags, numbered 1, 2, 3. In bag number 1 there are slips of paper with names on them: “Thomas”, “Zahra”, “Sami”, “Linda”. In bag number 2 there are phrases like “is a”, “wants to be a”, “was once a”, and so forth. In bag number 3, there are the names of professions: “doctor”, “carpenter”, “lawyer”, “teacher”, etc. You can probably see where this is going. To operate this “machine”, you pull a slip of paper out of each bag and arrange them 1-2-3 in order. You’ll always get a string of words that “make sense”. Can this arrangement of three bags write?
My suggestion is, appearances notwithstanding, that what language models in fact do is not something Barthes, Foucault, and Derrida would countenance as writing, any more than they would call our system of three bags an “écrivain”. Since these authors are “dead” in all the relevant senses, that’s not going to bother Mark and David, of course, so let me put it in more technical terms: what large language models do cannot be construed as writing even according to the “innovations” of the “postmodern literary theory” that Mark and David propose to “capitalize on”. The operations of ChatGPT are not “grammatological”; they do not make a “différance”. Their output, as a consequence, are not “texts” that can be “subject” to “deconstruction” or, even, I dare say, “reading”. It can of course easily be turned into text by a writer who puts their name to it, authorizing it and then, in order that may it be read, politely dying.*
I wish to make this argument by quoting passages form Barthes, Foucault, and Derrida as they appear in Mark and David’s text and simply challenging them to explain how they imagine ChatGPT carries out the necessary operations required of even the postmodern conception “writing”. Let’s start with Barthes.
Text is made of multiple writings, drawn from many cultures and entering into mutual relations of dialogue, parody, contestation, but there is one place where this multiplicity is focused and that place is the reader. … A text’s unity lies not in its origin but in its destination.
Roland Barthes, “Death of the Author”
Given what we know about how ChatGPT generates its output, it’s hard to see it “drawing from cultures” or “entering into mutual relations”. That is, this “multiplicity” that produces a text is entirely foreign to ChatGPT, which merely computes the next probable token in a string of tokens. I’m certainly curious to hear the analysis (or even deconstruction) of how the output is “made” as “text.”
Next, here’s Foucault:
Although, since the eighteenth century, the author has played the role of the regulator of the fictive, a role quite characteristic of our era of industrial and bourgeois society, of individualism and private property, still given the historical modifications that are taking place, it does not seem necessary that the author function remain constant in form, complexity, and even in existence. I think that, as our society changes, at the very moment when it is in the process of changing, the author function will disappear.
Michel Foucault, “What is an Author?”
But it’s important to recall that the disappearance of the author function does not mean anyone or anything can now “write”. Rather, new questions arise: “What are the modes of existence of this discourse? Where has it been used, how can it circulate, and who can appropriate it for himself? What are the places in it where there is room for possible subjects? Who can assume these various subject functions?” How, I want to know, can ChatGPT occupy these positions, execute these new functions?
Finally, let’s consider Derrida. Mark and David seem to think that for ChatGPT in particular “there is nothing outside the text”: “For the text of an LLM to make sense, the texts (and the contexts to which they refer) are enough. For this purpose, nothing more is needed.” ChatGPT, on their view, becomes not just a possible writer but an exemplary writer of non-logocentric text (better than Beckett? Better than Gertrude Stein?). But would Derrida agree?
‘There is nothing outside the text.’ That does not mean that all referents are suspended, denied, or enclosed in a book, as people have claimed, or have been naïve enough to believe and to have accused me of believing. But it does mean that every referent, all reality has the structure of a differential trace, and that one cannot refer to this ‘real’ except in an interpretive experience”
Jacques Derrida, Limited inc
Does it not seem like Mark and David’s “nothing outside the text” is, in the case of LLMs, a matter of suspending, denying, or enclosing all referents in a book? Where, in the operations of ChatGPT do we find it actually referring, i.e., producing a “differential trace” of the real? Where is ChatGPT’s “interpretative experience”?
Like I say, I want to leave this as a challenge. Mark and David have forced me to read Barthes, Foucault, and Derrida very closely these past few days, and that is of course rewarding all on its own. But the more I read them, the less likely it seems to me that they would countenance what ChatGPT does as any kind of “writing”. Sure, Barthes suborned the murder of literary authority, but he didn’t leave only a reader in its place. A “scriptor” was to take the author’s place. We could look more closely at what he thought this writer was doing. But I doubt we could ever conclude that ChatGPT is doing it.
*”Writing,” says Barthes, “is that neutral, composite, oblique space where our subject slips away, the negative where all identity is lost, starting with the very identity of the body writing.” It seems to me that this implies a body to begin with, a subject to slip away. Perhaps, when he so famously says that “the birth of the reader must be at the cost of the death of the Author,” we misunderstand him if we think the Author must die once and for all — that all authors die to make all readers possible. Rather, the author must precisely live, to do the writing, to be the body writing in practice, but must then die, if only in principle (or on principle, if you will), in order for the text to be read.
Let’s say we have a process that humans use P-H and a process that AI uses P-AI to produce a human text T-H, and an AI text T-AI. These outputs, T-H and T-AI, are indistinguishable – no-one can reliably tell them part.
Your claim is that the P-AI is not writing, though the outputs of these two processes are the same. P-AI is the wrong kind of process – the process is a “merely” process (“merely X [computes the next probable token in a string of tokens.]”). I have made this point before that “merely” is doing a lot of work here, just as it would if we said, for example, that humans merely write one word after the other that best fit the previous words.
But what might be the right kind of process? You provide a description from Barthes.
You say “it’s hard to see it “drawing from cultures” or “entering into mutual relations”. I say, it is hard to see otherwise! As this seems a very apt description of how LLM’s work to produce texts.
This is what GPT4 has to say on the matter:
# The Multiplicity of Writings: LLMs as a Confluence of Voices
Origin of LLMs: Large Language Models, such as ChatGPT, are products of training on vast datasets that encompass an enormous variety of texts. These datasets might include literature from various eras, web pages, technical documents, conversations, and more. Consequently, any output generated by such models is not an “original” creation in the conventional sense. Instead, it’s a distilled and reconfigured representation of all the content it has been exposed to.
Echoing Barthes’ Textual Amalgamation: Barthes posited that any text is not a singular entity but an intersection of various prior “writings.” These writings refer to different styles, discourses, ideologies, and more that the author might knowingly or unknowingly imbibe into their work. Similarly, an LLM doesn’t create ex nihilo; it generates based on patterns and structures it has seen before. The “voice” of an LLM isn’t singular; it’s a chorus of the countless voices it has been trained on.
Absence of a Singular Authorial Voice: Traditional texts, even when influenced by multiple writings, are generally filtered through the perspective, style, and intent of one author (or a group, in cases of collaborative works). In the case of LLMs, this singular human perspective is absent. Instead, the output is a blend of countless authors, styles, and discourses without a singular guiding consciousness behind it.
Implications for Originality and Authenticity: This characteristic of LLMs challenges our notions of originality. If a piece of writing is a blend of many prior texts, can it be considered original? While Barthes might argue that true originality is a myth because every author is influenced by prior writings, LLMs take this a step further. Their outputs are direct, tangible amalgamations of prior works.
Potential for Infinite Variations: Given the vastness of their training data, LLMs can produce content that reflects a wide range of tones, styles, and genres. Two prompts can yield entirely different responses, showcasing the model’s ability to tap into the diverse “writings” it has been exposed to.
In essence, the “Multiplicity of Writings” evident in LLMs exemplifies Barthes’ idea that texts are intersections of multiple voices and influences. However, LLMs present this in a more pronounced manner. While human authors, knowingly or not, embed multiple influences into their works, LLMs function entirely by this principle, intertwining and reconfiguring the vast sea of voices they’ve been trained on to produce coherent outputs.
Suppose you have three processes that get you from point A to point B. Call then P-P, P-T, P-A. No matter which process you use — planes, trains, or automobiles — the end result is indistinguishable: you arrive at point B.
My claim is that flying is not riding and riding is not driving though the outputs of these three processes is the same.
Humans do not “merely write one word after the other that best fit the previous words.” But ChatGPT doesn’t even know they’re words.
I’m sorry but I’m not going to debate with ChatGPT.
The point of contention and the target of the critique is this fragment: “these algorithms write without speaking, i.e. without having access to (the) logos and without a living voice.” This is, as you write, “the statement that I think is wrong.” And it is “wrong” because, as you claim in the title to your piece, “ChatGPT can’t write.”
I think you are—once again—agreeing against us. That is, I think what you offer as a critique is actually the point we have made in the essay. But everything depends on how you define (or better stated, do not define) “write.” As indicated in the earlier x-twitter exchange, you continually mobilize this word and seek to protect it from the apparent incursions of large language models, but you have not (at least not as far as I can tell) provided a definition of the word. Do you mean “write” in what Derrida calls the narrow sense of the word, e.g. arranging words (or linguistic tokens) in linear sequence on some tangible medium.
Or do you mean “write” in the logocentric sense, which Derrida in Of Grammatology, locates in Aristotle’s De Interpretatione. “Spoken words are the symbols of mental experience and written words are the symbols of spoken words. Just as all human beings have not the same writing, so all human beings have not the same speech sounds, but the mental experiences, which these directly symbolize, are the same for all, as also are those things of which our experiences are the images.”
Or do you mean “write,” in that specific Derridian sense, where “writing” (or better arche-writing) is the result of the deconstruction of the speech/writing dichotomy that underlies and serves as the foundation of Western Metaphysics. “Writing, for example, no longer means simply ‘words on a page,’ but rather any differential trace structure, a structure that also inhabits speech. ‘Writing’ and ‘speech’ can therefore no longer be simply opposed, but neither have they become identical. Rather, the very notion of their ‘identities’ is put in question” (Barbara Johnson, Introduction to Derrida’s Dissemination).
Since you have not defined “write” in your writing, one can only guess. And if I had to guess, I would surmise that you have been working with the second definition. If this is correct, then you already agree with the statement that you seem to think is wrong. The statement “these algorithms write without speaking, i.e. without having access to (the) logos and without a living voice” means simply this: “these algorithms write (narrow definition, e.g. arranging words or linguistic tokens in linear sequence on some tangible medium), but they do so without there being a human subject who stands behind the written words and has something to say. In other words, LLMs interrupt the logocentric model of writing.
Thus, you are right: LLMs can’t write in the way that we (Western European metaphysicians) understand and operationalize the word “write.” But that is, for Mark and myself, where things get interesting. Thank you for your reflection. It is, I think, gratifying to see how we are coming to the same (or at least substantially similar) conclusions by different routes.
I thought it was clear that I mean they can’t write in any of those senses. To say even that they write qua “arranging words” would be to say that my little three-bag (manually operated) machine can write, which it of course can’t.
But the important thing is that they don’t write in a sense, as I say, that Barthes, Derrida, or Foucault would countenance as writing. They do not, e.g., produce a “differential trace structure”.
My challenge to you and Mark was to explain how they do. Because if they don’t then it is hard to see how they pose a challenge for Western metaphysics. “The fundamental challenge (or the opportunity) with LLMs, like ChatGPT or Google’s Bard,” you say, “is that these algorithms write without speaking.”
Are you saying that you mean “write” here only in that narrow sense?
In that case, it is hard to understand the point of this paragraph in your paper:
“This non-representational view of language thus helps to make sense of what happens in the case of these LLMs: it helps us to understand why these texts can make sense at all (to humans)—indeed enables us to understand the very semantic-performative possibility of the texts generated by this technology—without relying on something outside the texts it finds on the internet. For the text of an LLM to make sense, the texts (and the contexts to which they refer) are enough. For this purpose, nothing more is needed.”
It they’re just putting words together in grammatical ways, why do we need a sophisticated non-representational view of language to make sense of it “at all”? We can just read it.
Two items
1) Unfortunately, LLMs do produce a “differential trace structure.” If not, their output would be illegible. It might be help matters for you to (finally) define what you mean by “write” and how your formulation accords (or not) with what can be read in Barthes, Foucault and Derrida.
2) As Derrida once told my grad seminar: “We do not arche-write. Arche-writing is the condition of possibility for us to speak and/or write.”
1) Do photocopiers “produce a differential trace structure”?
2) You’re picking nits, but it’s good to be precise: I say ChatGPT does not operate on the conditions of the possibility provided by arche-writing. It does not proceed on that ground.
1) Yes…photocopiers produce a differential trace structure. If they did not, you would not be able to distinguish the copy from its original.
2) Technology–whether that be writing (as described in Plato’s “Phaedrus”) or a “writing machine” like your manually operated bag (basically what the Dadaists like Tristan Tzara did for generating poetry) or an LLMs–operate on the condition of possibility provide by arche-writing. This is one of the main insights developed in Derrida’s essay “Plato’s Pharmacy”
Is there a significant difference between the “challenge” posed by LLMs in 2022 and the photocopier in 1949?
I wrote in my previous comment:
“You say “it’s hard to see it “drawing from cultures” or “entering into mutual relations”. I say, it is hard to see otherwise! As this seems a very apt description of how LLM’s work to produce texts.”
Can you explain why is it that you don’t think LLM’s produces outputs “drawing from cultures” or “entering into mutual relations”? It seems obvious to me that they do, yet it seems obvious to you that they don’t.
I usually begin with Blaise Agüera y Arcas’ explanation of how they work: “Neural language models aren’t long programs,” he said. “They consist mainly of instructions to add and multiply enormous tables of numbers together.”
We know that these calculations are done to decide the next word in a sequence. Now, I “draw on culture” and “enter into relations” when I write (these words, for example). I’m not just trying to decide what the next plausible word is.
It’s true that those “enormous tables of numbers” were constructed on the basis of a great many “related” texts in our “culture”. But when a language models generates its output, it’s not looking at those texts at all. It’s just looking numbers up in tables.
What we do when we write requires imagination. ChatGPT doesn’t have any.
Two problems in your argument:
1. You focus just on the operation of the final output layer of LLM’s. The output layer of a LLM produces a probability distribution of next tokens (i.e. glossed as “next most probable word”). But the operation of the easy to describe input and output layers is very different from the layers in-between. Most of the good stuff happens in the middle, though it is more inscrutable.
2. More importantly, you provide a (mis)characterisation (see 1.) of the complete computational process P-AI which you argue is the wrong kind of process, but you provide no computational description of the right kind of (human) process necessary for the human process being an instance of writing. Your phenomenological experience is not very helpful here to characterise the underlying computational processes.
I think we reached the same place before in a previous discussion in that I am inferring that your view is that writing is writing only if it is accompanied by phenomenological experience (or perhaps a particular kind of phenomenological experience). If so, this seems more defensible than the computational argument.
Yes, I don’t think the human process is computational. The correct description of writing is phenomenological. It is precisely because the LLM can be described in entirely computational terms that I don’t countenance what it does as writing.
And I don’t think any part of what an LLM does is inscrutable. We know exactly how it does what it does.
I believe that the human process is ultimately computational, and if we are comparing the two it is fair to compare them on equal terms, but I understand the argument why certain terms should be reserved for humans (or intentional agents with more in common with humans than LLM’s currently).
In my previous comment I said that by focusing only on the output layer of LLMs you mischaracterise the operation of LLM’s. That you say no part of LLM’s are inscrutable reinforces that point. Do you disagree that your focus on the output layer is a problem? And can you describe why you don’t think the hidden layers of LLM are inscrutable?
I would say it is true that unlike human brains, we have a complete understanding of several aspects of the operation of LLMs, including the physical substrates and how artificial neurons operate. But as I have pointed out in previous comments, we barely have a mechanistic understanding of some very simple toy models realise their functions (i.e., map from an input to an output). Scaling this up to large models is a huge challenge – such that I believe that mechanistic interpretability of large language models is thought to be one of the hardest problems in computer science right now. Hence, pretty inscrutable.
The hidden layers are, as I understand it, only “hidden” relative to the our interaction with the model. The weights in all the layers can be inspected, and all the calculations that the algorithm does can in principle be done manually. It would just take a hell of a long time.
We also know how the weights are arrived at through training. We know how back propagation works, etc.
I don’t claim to be a technical expert, and maybe you can correct me on this, but this is the understanding I have gotten from consulting what I take to be the most credible expertise available.
I really don’t think there is any plausible analogy between neural networks and human brains qua “cognitive function”. It’s just and only a metaphor.
What you say in your response is all true, but everything I said in my previous comment is all true also.
The hidden layers learn a mapping from an input to an output (a function). We can fully describe mathematically how information flows through the system based on the algorithm, weights (from learning) and activations. But we lack interpretability of how those functions are realised in all most the simplest neural networks. We don’t understand how they do what they do, what the system has learnt about its inputs, how this is transformed, what the different layers do and how they combine. This is for several reasons – mainly due to extremely high dimension non-linear processes leading to very complex interactions with emergent behaviours.
In regard to your last point, there is a lot of work using aspects of LLM’s such as convolutional neural networks and transformers as useful models in understanding how brains realise cognitive functions.
“We don’t understand how they do what they do, what the system has learnt about its inputs, how this is transformed, what the different layers do and how they combine.”
This only seems correct to me if we interpret “what they do” at a higher order than they actually do it. That is, if we try to understand how they execute “cognitive” functions. We don’t know how they “master subject-verb agreement” or how they “reason about causes”.
But the mystery is solved when we realize that they don’t actually do any of these things. They have zero knowledge of grammar or physics. They only *appear* to.
They are models of word frequencies and “what they do” is to predict the next word. They are *very* good at this. And we know exactly how they do it. They compute the probability that any word will be next and then they stochastically choose among the most likely candidates. Then they do it again.
As I have noted several times above, it is a problem if you just focus on the output layer in talking about these models. Yes, these models are *very good* at next word prediction. But when you look at what the rest of the model is doing only then can you begin to understand *why* they are very good at this.
It would be interesting to hear your short account of “why they are good at” next word prediction. As I understand it, it is not because the grammar or knowledge is encoded in the neurons. (As it seems to be in human brain.) It it literally because the probability of each word being next is modeled.