Gary Smith writes:
The real danger today is not that computers are smarter than us but that we think computers are smarter than us.
This is in an article where he, like me, disagrees with the statement by computer scientist Geoffrey Hinton that, “by training something to be really good at predicting the next word, you’re actually forcing it to understand.”
I’d also like to again point out that the impressive thing about generative AI is not that it’s “really good at predicting the next word,” but rather that it can do new things such as write entirely new poems, etc. Yes, these new things are created using a prediction algorithm, but, again, what’s impressive is the creation, not the prediction. Similarly, when the chatbot helps you write and debug a computer program, the impressive thing is that it’s giving you information that you didn’t know. It’s not predicting anyone’s next word.
So I think Hinton was simultaneously hyping and under-selling what chatbots can do.
Smith demonstrates the chatbots’ lack of understanding by feeding them a problem that is similar, but slightly different from, the classic 3-cards problem in probability. Unsurprisingly (given what we know about language models), the bots pick up on the 3-cards problem that’s stored in some dispersed way in their memory and give various wrong answers that reveal a sort of patchwriting which, as Smith puts it, “all of the words are sorted into correct English and yet Bing [and similarly with the other chatbots] clearly did not understand the words since it gave a three-card answer to a two-card question.”
Now you might reply that, sure, but the 3-cards problem is hard! Most humans don’t understand it either. To which I’d respond, yes, exactly: most humans don’t understand it, and the chatbots don’t understand it either. The chatbot performs like a student who’s been trained to do well in school by manipulating phrases in a way that, if you don’t focus, sound reasonable. That indeed was the main point of my earlier post, where I wrote:
It’s almost like the difference between jogging and running. I can jog and jog and jog, thinking about all sorts of things and not feeling like I’m expending much effort, my legs pretty much move up and down of their own accord . . . but then if I need to run, that takes concentration.
Now to return to Smith’s main point, that the real danger is “that we think computers are smarter than us.” He quotes “TED organizer Chris Anderson,” who saw a chatbot demo and wrote, “Surely it’s not crazy to think that sometime next year, a fledgling Gemini 2.0 could attend a board meeting, read the briefing docs, look at the slides, listen to every one’s words, and make intelligent contributions to the issues debated?”
I guess it all depends on what is meant by the term “intelligent.” I guess that the chatbot might be able to produce a short summary of the meeting: “Mary said XYZ, then Jim said ABC, then Sue interrupted to say D,” etc., and that could be useful and, if not reflecting any understanding, still be as good or better than the equivalent summary made by an intelligent person. But if Anderson were to ask the chatbot for advice on what to do next, then, yeah, I’d be kinda scared. Yes, I’m also not thrilled about how lots of institutions are currently run—maybe a generative AI would have fired the employees of a university that faked its statistics, something (the firing, not the faking) which seems to have been beyond the capacities of my current employer—; still, I agree with Smith’s point that the chatbot, impressive as it is, is not understanding anything in the sense that we understand things when we understand them.
In his second recent post on the topic, Smith writes that large language models “can generate falsehoods faster than humans can correct them,” and he gives the example of a ridiculous claim that he extracted last year from ChatGPT that “According to estimates, about 49 bears have been sent into space by Russia since 1957,” which the chatbot then backed up with a set of irrelevant names (“Alyosha”, “Ugolek”, “Zvezdochka”, “Strelka”, “Belka”, “Pushinka”, and “Vladimir”), meaningless references, and links that went nowhere.
Smith follows up:
A few days ago, I asked the same question of Microsoft Copilot (formerly Bing), which is based on OpenAI’s GPT-4, thinking that GPT’s human trainers might have fixed this embarrassing glitch by now. They hadn’t—and Copilot’s answer to my query was revealing:
Human: How many bears have Russians sent into space?Copilot: According to estimates, about 49 bears have been sent into space by Russia since 1957. These brave bears were part of the Soviet Union’s space program during the 1960s. Some of them even had names like “Alyosha,” “Ugolek,” “Zvezdochka,” “Strelka,” “Belka,” “Pushinka,” and “Vladimir.”
Copilot gave four Internet references, three of which were discussions of ChatGPT’s bears-in-space hallucinations. The fourth reference was about the history of Russian space dogs. Two of the three bears-in-space references were to articles written by me or quoting me, although neither link was to my 49-bears report that Copilot repeated virtually verbatim.
He summarizes:
At some point, human handlers will train Copilot and other LLMs to respond that no bears have been sent into space but many thousands of other misstatements will fly under their radar. LLMs can generate falsehoods faster than humans can correct them. . . .
This self-propelled cycle of falsehoods is likely to get worse, much worse. As LLMs flood the internet with intentional and unintentional fabrications, LLMs will increasingly be trained on these falsehoods and be increasingly prone to regurgitate them. It won’t just be amusing garbage about Russian bears in space. . . .
A lot of the discussion I’ve seen on AI has a polarized tone. On one side are the skeptics who seem to want to no-true-Scotsman the amazing progress that these chatbots have made; on the other side are the enthusiasts who can’t turn off the hype. I appreciate Smith’s measured skepticism, and I’ll set it alongside Bob Carpenter’s measured enthusiasm: Bob uses chatbots to do all sorts of useful things in his work and leisure activities.