ChatGPT: the end of machine learning


Okay, not the end of machine learning in general, but the precise title would be way less fun.

The GPT language generation models, and the latest ChatGPT in particular, have garnered amazement, even proclomations of general artificial intelligence being nigh. It’s not. ChatGPT is the evidence that the whole approach is wrong, and further work in this direction is a waste.

The GPT models assume that everything expressed in language is captured in correlations that provide the probability of the next symbol. That is, if I take a giant corpus of language and I measure the correlations among successive letters and words, then I have captured the essence of that corpus. This is what is known as the statistical approach in natural language processing.

The statistical approach took off because it made fast inroads on what had been considered intractable problems in natural language processing. The details of morphology of words? You could puzzle out theories for them for each language, informed by other languages in its family, and encode them by hand, or you could feed a huge number of texts in and measure which morphologies appear in which contexts.

Fast forward decades and an enormous amount of money later, and we have ChatGPT, where this probability based on context has been taken to its logical conclusion. Before this point, the models were always too limited in what they could understand and generate, too narrow in the material that was in their corpus, to really experiment on what the approach can do. ChatGPT is good enough where we can type things to it, see its response, adjust our query in a way to test the limits of what it’s doing, and the model is strong enough to give us an answer as opposed to failing because it ran off the edge of its domain.

It fails in several ways.

The first way it fails we can illustrate with palindromes. It can give you strings of text that are labelled as palindromes in its corpus, but when you tell it to generate an original one or ask it if a string of letters is a palindrome, it usually produces wrong answers. Palindromes are not something where correlations to calculate the next symbol help you. The system needs the ability to instantiate and play symbolic games. There is no such layer in ChatGPT. Palindromes might seem trivial, but they are the trivial case of a crucial aspect of AI assistants. If you are going to ask a program to schedule dinner for three people based on their calendars and make a reservation at a restaurant for them, the system must be able to handle symbolic games. If it can’t handle trivial ones, there’s no hope for more sophisticated ones.

The second way it fails is being unable to play language games. Try getting it to sing (well, type) “Row, row, row your boat” with you as a round. It is incapable of developing an internal representation of a language game in order to follow the language game’s rules. Human interaction, even very prosaic discussion, has a continuous ebb and flow of rule following as the language games being played shift. Someone interjecting a humorous remark, and someone else riffing on it, then the group, by reading the room, refocusing on the discussion, is a cascade of language games. Similarly, a discussion where you try to arrive at something is simply not possible with ChatGPT because it cannot adjust as the discussion proceeds and establishes changes in the language game.

Finally, the model openly says “I am not able to create or suggest connections between concepts that do not already exist,” which means that it is a useless tool unless your interaction with it is to only be in paths well enough trodden to be mapped fully in its corpus. I am reminded of the bar scene from Goodwill Hunting.

So: statistical language generation, after decades of hard research (and some of the creations along the way, like transformer networks, are quite amazing) has reached its culmination, and the final lesson is that, without something more than statistical correlations, this is a dead end. The field is going to have to go dust off classical natural language processing and figure out how to wire up something that we can deal with via theory of mind to judge whether the system is playing the language game we want it to play correctly.

Hunks of that are clearly going to use statistical techniques. The practical AI assistants like Siri and Google Assistant already do that: first guess which of a fixed set of language games it’s playing, then process the input in that context. What about doing it more generally, where it can learn language games and follow rules about conceptualizing and playing symbolic games? Then the existing games must be in the same form as the games it is supposed to learn, so they’re malleable. And that means they have to be trained into the live system from some minimal set. We spend decades doing exactly that with each human child. When humans learn to keep birds as pets, they have to learn the innate body language and its language games for birds, which are quite different from those of mammals. A path to general artificial intelligence is going to involve much more teaching and animal training than programming and curating texts.