ChatGPT: We need a new definition of intelligence

From coding to chess, computers keep outperforming humans at cognitive tasks, revealing a huge gap in our understanding and definition of intelligence.  We need to consider what lies between the capabilities of computers today and a hypothetical Artificial General Intelligence.

For decades, scientists and philosophers have debated, sometimes rather vigorously, if a computer could ever be truly intelligent and how we might know for sure.  The pioneering mathematician and early computer scientist, Alan Turing, proposed his famous test for machine intelligence in 1950, long before computers became a ubiquitous part of our lives or were capable of exhibiting anything like intelligent behavior.  Turing was a farsighted genius, however, and he discerned two things that remain relevant today.  First, it was only a matter of time before the complexity of a computer program increased to the point where it began producing unexpected outputs, outside of what the creator had originally conceived.  He was responding to one of the founders of computing in the mid-19th century, Ada Lovelace, who claimed that the original “Analytical Engine” was “incapable of originating something.”  Turing took that to mean a machine could not surprise us, noting the contrary, “machines take me by surprise all the time.”  Second, Turing recognized that the opacity of other minds, our inability to see the machinery inside another person, limited our ability to test for intelligence.  Hence, he proposed that any test needed to be based on the output, and computers that behaved in a manner we consider intelligent, were intelligent.   This would be achieved through a period of rigorous questioning by a human interrogator.  If the human could not state with certainty whether they were interacting with a machine or another human, the machine must be intelligent.  In the 70 plus years since, no one has come up with anything better.

Implicit in this view is the idea that a hypothetical machine intelligence would function like human intelligence.  The underlying processes would be necessarily different, neurons replaced with circuits and code replaced with thoughts, but the end result would be similar to us, or at least sophisticated enough to simulate humans perfectly, which is a distinction with little difference.  In fact, Turing himself never attempted to define intelligence, and his test doesn’t require a specific definition, rather he assumed that we would know it when we saw it.  This was a rather ingenious way to solve an underlying problem:  No one has been able to satisfactorily define intelligence throughout all of human history.  To this day, it lacks an adequate definition. Is intelligence creative problem solving?  The ability to learn?  Directed behavior?  Conscious experience?  Language and communication?  Some combination of all of the above?  For Turing, it did not matter.  Philosophers and computer scientists have since refined Turing’s original ideas, and labeled this sort of intelligence Artificial General Intelligence, which is the ability of a machine to learn and understand any task a human can, but the end result is largely the same with a different name.  This lack was an issue only theory, however, widely debated among philosophers and few others, until technology advanced to the point where computers began besting humans in more and more cognitive tasks.  Today, there are precious few tasks where humans remain superior to machines.

ChatGPT has gotten a lot of attention for its ability to write as well as the average person, code well above average, pass complex exams like Wharton’s Masters of Business Administration, and perform other sophisticated mental tasks, but over the past two decades computers have been steadily encroaching on previously human domains.  There is not a human in the world that can beat a computer at either chess, Go, or Jeopardy, for example.  Computers can also create artwork, compose music, and write poems at a reasonably sophisticated level.  At the same time, few believe we have achieved the goal of Artificial General Intelligence despite some reports to the contrary.  Instead, these machines are purpose built with carefully tailored algorithms and data sources to perform a certain task incredibly well.  ChatGPT, for example, is known as a “generative language model,” designed to mimic, but not achieve a semantic understanding of language.  The specifics of this need not concern us here, except to compare these specialized models to how a general, or perhaps I should say, human intelligence would function.  Humans are also equipped with a generative language model of a kind.  The brain has three dedicated areas for speech, comprehension, and memory retrieval related to communication.  Broca’s area is considered the speech processing center, containing business rules for the construction and deconstruction of sentences.  Wernicke’s area supports comprehension and the angular gyrus facilitates retrieval of related memories.  All three of these areas operate on a subconscious level.  We can peer into our brains and see what’s happening inside. Instead, they work smoothly behind the scenes in most people without requiring any thought whatsoever, but without them we would not be able to communicate effectively. Their influence is also so powerful humans who have never been taught a language will spontaneously invent primitive forms of communication.  The famed linguist Derek Bickerton spent decades studying this phenomena around the world, identifying multiple cases where even deaf children who were never exposed to speech began developing signs on their own at less than five years old.  As he and others have found, the language instinct is strong in people, it occurs automatically, and we do not need to know anything about the theory to start doing it in practice.

Of course, there are limitations to how far this instinct can go and the complexities of any proto-language.  Language also operates on another level of thought and meaning.  These levels can be closely tied together, as when dealing with objects that are present before us, like food or rock where proto-languages are sufficient for most interactions, or they can be further removed like when we talk about food or rocks generally.  In those cases, language needs to be complex enough to capture some aspect of what food and rocks mean in an all purpose sense, outside of any specific instance.  Once again, exactly how this level is achieved, whether meaning is objective or subjective, fixed or contingent, and more has been a subject of debate for generations, but this too need not concern us.  We need only to accept that this level exists in some form.  If you doubt that it does, consider how we interpret a poem.  To a large extent, poetry is poetry because it operates at these multiple levels of meaning.  There is the literal sense of what the author wrote on the page, and then there is the figurative meaning operating above and beyond mere words, imparted by our experience reading it, emotion, meter, rhyme, etc.  Thus, when Shakespeare wrote “They that have power to hurt and will do none, That do not do the thing they most do show,” we know that the speaker himself is talking about a person who is the opposite.  A person who has hurt him or her; a person that behaves on an entirely superficial and surface level.  Likewise, later in the same sonnet, number 94, when the speaker concludes “Lilies that fester smell far worse than weeds,” we know he or she is not talking about rotting plants at all, but rather a promising relationship that has withered and died, which because it was beautiful is made even more disgusting in its demise.

Further, one need not have any specific knowledge of this particular poem, Shakespeare, or poetry in general to discern this meaning.  One need only be human, and spend a little time considering what might be lurking beneath Shakespeare’s explicit statements.  ChatGPT, on the other hand, cannot access this level of meaning.  Words and sentences to it are purely semantic constructions to be broken down on a literal level.  If you ask it to analyze a poem, the software looks up what others have said about it and assembles a response.  It does not experience the poem as human intelligence, and presumably an Artificial General Intelligence would.  ChatGPT exists on the instinctual, unconscious level of language, similar to the three areas of our own brains, not the figurative, implied level that conscious thought produces.  I’ve chosen poetry here as an example, but the same is true of all other current forms of Artificial Intelligence.  Whether the machine is playing chess or creating music, it relies on fine tuned generative models to excel at a specific task.  These models are frequently far superior to humans, but they cannot impart meaning into what they are doing.  The chess computer does not know why it plays, is not curious about the nature of the game, does not anticipate the next game, worry about winning or losing, or think about the game at all beyond the move before it.  It simply plays.  Nor can these generative models be applied to other endeavors.  ChatGPT will never take its communication skills and apply them to making movies or playing songs.  The machine that excels at chess cannot adapt itself to another game like Go, taking skills in one area and using them to an adjacent one.  Each is programmed and fine tuned for a specific function or set of functions.

Humans, of course, and presumably Artificial General Intelligence, does not suffer from this limitation, hence the gulf between Artificial Intelligence today and what it might be tomorrow.  It is, however, difficult to say for sure precisely how wide this gulf is given that few, if any, anticipated one in the first place when Turing and others were considering what machine intelligence might look like seventy years ago.  Still, the gulf is certainly real, and measuring it properly necessitates reconsidering the different possible types of intelligence.  If these incredible machines are not intelligent in a human or general sense and yet can out compete humans in many intelligent endeavors, what precisely are they?  Putting this another way, it was easy to claim computers weren’t intelligent when they were performing tasks by rote and simply crunching numbers.  Now, however, they are clearly doing far more than that, exhibiting a range of unexpected behavior not directly programmed into them and for which we cannot easily explain.  Therefore, we can only reasonably conclude that they are intelligent in some sense, even if it is not the same as our intelligence or the theoretical Artificial General Intelligence.

The question is:  What sense is that?  Precisely how to answer this question is obviously subject to a lot of potential debate and a wide range of opinions, nor can I claim to be either a computer scientist or a philosopher, but two frameworks present themselves, neither of which are mutually exclusive.  First, we can break intelligence down into different categories of behavior and develop a measurement system for each, rather than viewing it generally.  There is broad agreement that intelligence is composed of multiple facets including the ability to learn from experience, recognize problems, and solve them, sometimes known as practical intelligence, creative intelligence, and analytical intelligence.  Beneath these higher level skills, there is associative memory, numerical ability, perceptual speed, reasoning, spatial visualization, verbal comprehension, word fluency, and more.  Generative models would likely score incredibly high in some of these areas, but poorly in others with a wide variance based on the purpose of the model.  ChatGPT for example would excel at verbal comprehension and word fluency, while doing poorly at spatial visualization or perceptual speed.  In some cases, there are metrics for these skills.  The Wechsler Adult Intelligence Scale, for example, bases its scoring on perceptual reasoning, processing speed, verbal comprehension, and working memory.  These metrics will be insufficient to capture the subtlety of Artificial Intelligence, but can serve as an obvious starting point.  We should also be sure to include task based performance, and develop more refined ways to measure a machine’s capabilities.  For example, when we say ChatGPT performed above average compared to the average computer programmer, what precisely was being tested?  The ability to break down the algorithm into parts, or the ability to repurpose what others have already done?  Likewise, when we say the program passed an MBA exam, were the questions based on facts that are publicly available or reasoning that needs to be applied?

Ultimately, this leads to the second point, and one of the key challenges in evaluating the performance of an Artificial Intelligence.  ChatGPT effectively has instant access to the entirety of human knowledge.  If you give it a problem that someone else has solved, the computer will find that solution and present it to you, but this does not mean that ChatGPT itself has offered any solution of its own.  Many cognitive tasks today are based on copying and adapting what came before.  Few computer programmers, for example, write their own code from scratch.  They find something similar and adapt it.  ChatGPT will necessarily be better at this than anyone alive, but that is not the sum total of what it means to be an exceptional computer programmer.  The creators of these Artificial Intelligences need to develop ways to evaluate their performance when they are only connected to a limited data set.  The goal should be to separate what the machine is producing, from what it finds that others have produced.  The classic vision of an IQ test was to administer it independent of any learned knowledge, to evaluate the cognitive processes on their own without the benefit of knowing the answers in advance.  Right now, ChatGPT and other systems know all these answers in advance, making it difficult to say if they are producing anything intelligent on their own.  There is no reason this needs to be the case, and the developers should begin publishing more detailed analyses and additional information on precisely what is in the data set and whether the software actually added anything new, otherwise the end result is nothing more than an incredibly powerful, interactive search engine.

Finally, we must be careful of the human tendency to personify everything and impart meaning when there is none.  It should be no surprise that ChatGPT is being called sentient by some while other advanced Artificial Intelligence models that arguably perform more sophisticated tasks on an even higher level are not.  AlphaGo for example is said to have developed new strategies that upended centuries of tradition, but no one claimed it had a soul.  This is because only ChatGPT talks like us and that alone leads us to believe it must be thinking.  This can be highly deceiving, however.  A madman or an Alzheimer’s patient can babble in seemingly intelligent sentences, but not mean a thing.  ChatGPT, at least right now, is likely the same thing.  The major advance since Turing’s day is that the combination of complex business rules and almost inconceivable amounts of data produce outputs that would previously have been described as requiring intelligence.  I do not doubt that some of these tasks do require models of parts of intelligence, even if they work in ways that was completely unexpected, but unfortunately, we will not know for sure until we re-evaluate our definition of intelligence in the first place.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s