Artificial Intelligence, the nature of creativity, and the difference between humans and machines

In a world where computers are creating content, from text to images, video, even writing songs, can they rightly be considered artists, or is there still something that makes humans unique?

Today’s Artificial Intelligence software can create new things, generating content that has never been produced before, from the written word to pictures, videos, even music.  While critics may debate the artistic merit of these creations, the fact that computers are bringing into existence things that are entirely new is beyond dispute, as is the reality that many of these creations are unique and unexpected.  Some of us have probably seen the stories online about AI behaving in strange ways, at times exhibiting bizarre behavior, but even on a day to day basis these new tools are busy creating content of all kinds, sometimes so well most people don’t even realize it was produced by a machine.  Personally, I use it now almost exclusively for images on this blog.  Before I publish my posts, an AI image generator summarizes the content into a single paragraph, rephrases it in terms of an image along with notes like “photo real” or “stylized” that would be appropriate to represent the content, and then creates the image itself in a span of about 15 seconds.  Having used it repeatedly for months, two things jump out.  First, while there are times when the image makes little sense and the computer is obviously misinterpreting the intent, it’s far more frequent that it comes up with something meaningful, often on the first try, and sometimes it creates something better than I’d imagined, more insightful than I would’ve thought before I pressed the button.  Second, it never produces the same image twice, even from the same prompt, as though the computer were a human drawing with pen and ink, unable to replicate the same exact brush strokes.  On occasion, two images generated from the same prompt will be wildly different, not even close to one another in any sense, representing opposite interpretations of the prompt.  This is true even though Generative AI, the subset of the broader field devoted to software and algorithms that produce output from prompts, relies on a prediction engine to function, meaning the machine is guessing what a human would expect to see, read, or hear, producing content based on its internal data about our expectations.  Essentially, the underlying software uses a variety of techniques and relationships to attempt to predict what word, pixel, frame, note, or whatever should come next based on the data used in training.  By analyzing almost everything that has ever been created and digitized, the computer can use that knowledge to pick the optimal next word in a sentence, for example, creating sentences, paragraphs, and more that are both natural and meaningful.

While we don’t know precisely how this process works in the human mind, centers in the brain have been identified that are dedicated to processing language and images, which at least on the surface appear to serve a similar function.  We might not describe these centers as prediction models in our day-to-day lives or even in scientific circles, but the function and output is essentially the same.  Returning to language as an example, we don’t need to think about every word in a sentence, either those that are spoken to us or we are speaking to others.  Instead, somewhere below our consciousness, the brain processes inbound and outbound statements, extracting and conveying meaning in a (largely) predictable fashion.  Though the underlying mechanisms are likely quite different given our brain runs on neurons rather than transistors, the process wouldn’t work if there weren’t significant commonalities.  We don’t know how, but something in our minds associates individual words with our internal lexicon and processes them according to another internal library of semantics and sentence structures.  Because we cannot read another’s mind or directly access their thoughts, the combination of these two processing layers must include some mechanism with which to predict the meaning of what we’re being told and how the listener will react to what is being said, the prediction of meaning they will make on their own.  The philosopher W. V. O. Quine used the term “stimulus meaning” to describe both the knowledge shared across minds that speak the same language and the prompt that would cause someone to utter a specific sentence.  In his view, speakers of the same language maintained internal lists of associations regarding a given stimulus.  For example, a rabbit in the grass would prompt a speaker to declare, “There’s a rabbit,” and any listener who shared the same stimulus meanings, or close enough would understand what was meant by comparison to his or her own internal list.  Likewise, observing the entrance to a burrow might prompt the same statement and response.  In both events, the speaker is able to unconsciously predict that the listener will understand the statement because his or her stimulus meanings are shared, and vice versa.  Thus, the output of our own language processing centers is similar enough to Generative AI models that we can consider them equivalent in more ways than otherwise, making it somewhat immaterial what the underlying process is and whether a computer should technically be called an author, a designer, or whatever.

Crucially, one of these similarities introduces a measure of unpredictability or randomness in the output.  Setting aside the complexity of the data set, AI models include parameters that influence results in unpredictable ways, making each answer generated by the computer unique, at times in an almost stunning fashion, what we might describe as exhibiting some level of creativity.  Humans, of course, do the same.  Whatever is happening in our minds, we do not always choose the same words and phrasings, sometimes surprising us when we come up with something entirely new.  Some are better – or perhaps more prone to this than others is a more accurate description – suggesting that whatever model our brains use, also has a random component whatever the origin, which can also be described as creativity.  If this is the case, and admittedly some might call it speculative, can computers rightly be considered creative?  The answer is, not surprisingly, quite complicated, largely depending on how you define creativity in the first place.  If you use a broad, general definition without any reference to how it works in the mind itself, as in the ability to produce something novel or new, computers are clearly creative at this point, but if you introduce more nebulous concepts like inspiration and engagement in the overall creative process, the answer is a lot less clear.   For the human artist, creating something is a process that combines the conscious and the unconscious.  We experience some moment of inspiration that springs from an unknown source, so unknown that some have referred to as the divine, attributing a spiritual origin, and reflect upon it consciously, analyzing, dissecting, iterating, imagining, etc.  We then go to work on our creation, using some combination of conscious and unconscious processes to produce the final output.  Computers, however advanced they may seem, do not have conscious experiences and are incapable of this type of interplay.  Instead, the Generative AI model is all that there is, meaning a computer is creating content using an entirely statistical approach.  Interestingly, this approach is further differentiated from our own because the models themselves are general in nature and do not “care” whether they are tasked with processing words or images, sounds, or video.  The content type in question is instead handled entirely by the training data and the relationships.  This is one of the reasons that AI has exploded since ChatGPT was released in 2022.  Software companies didn’t need to invent new models to handle different mediums.  They simply needed to retrain the models and adjust the relationships to the point where Google believes the same Gemini AI built into search and on our phones can be trained to drive a car.

Though we can’t say for sure, human processing mechanisms for specific mediums are likely significantly more unique, having evolved down their own specialized pathways for sight, sound, motion, etc. over millions upon millions of years.  We know this for two reasons.  First, faculty with one medium doesn’t translate to another.  A person can be an excellent writer, but a terrible visual communicator.  Similarly, damage to one area of the brain doesn’t impact another.  We can be totally blind, yet still create some of the greatest poetry ever conceived as John Milton did towards the end of the immortal Paradise Lost.  Second, even with a specific capacity such as sight, the brain has specialized sensors and processing components for the different aspects of an image like color, depth, and motion.  While the mechanisms might be unclear, these capacities use different receptors and are routed through different areas of the brain and are likely to rely on different algorithms for lack of a better word, suggesting that unlike computers, humans do not possess some kind of general model, but rather a set of highly specialized ones.  The differences in how this impacts our creative capacity compared to computers might be unclear, but are just as likely to be meaningful.  More important and more distinct, however, is the capacity for conscious experience, that level upon which humans look upon the output of their unconscious minds and make judgements.  We do not merely experience a moment of inspiration, where something new springs into being.  We recognize it as something new and evaluate its relative worth.  Sometimes an idea sucks and is discarded.  Others, it’s deemed worthy of further exploration based on a unique combination of rational analysis and emotional experience.  At least at this point, computers possess nothing like this.  They don’t know what the output is in any meaningful sense, much less how to evaluate it.  ChatGPT could come up with a play greater than Hamlet right now, but to the computer it will remain the output of a prompt, a bunch of ones and zeroes behind the scenes, leaving a human to understand its significance.

Conceivably, the computer could process the final output through the model backwards, comparing its creation to the training data, and assigning some type of score based on uniqueness, quality, or whatever, but that alone doesn’t suffice because the computer would simply be comparing it to what was already known, not what was yet to be.  To be sure, how we do this ourselves remains entirely unclear, and yet the fact that we can recognize an idea, however outlandish, and realize its significance is the source of human progress.  More often than not, an advancement in a given field is made when someone outside it comes up with something new, like Albert Einstein in his miracle year of 1905, recognizes its importance, builds on it, and only then is it internalized by the field itself.  In other words, the work of Einstein and others didn’t exist in the training set when they began. How a computer can do this remains elusive and might be for the foreseeable future.  While there is no proof in principle that this cannot be achieved by some kind of new statistical, prediction model that essentially runs on top of the existing generative ones, similar in a sense to way our conscious minds operate on top of our unconscious, there is no candidate model or even an idea for one at this time.  For now, therein lies a critical difference between humans and machines.

Leave a comment