No, AI cannot translate animal sounds into human speech because animals don’t freaking talk

For some reason, the experts are claiming we can translate animal speech that doesn’t really exist in the same sense as human speech, and they are proposing to do so with only half the necessary data, the sounds alone without the show, so to speak.

Late last year, the Coller Dolittle Prize announced that researchers who “crack the code” of animal speech can win up to $500,000, setting off something of a media frenzy about the potential to leverage AI technology as a translation mechanism between the human and the animal worlds.  As Wired phrased the question “In 2025 we will see AI and machine learning leveraged to make real progress in understanding animal communication, answering a question that has puzzled humans as long as we have existed: ‘What are animals saying to each other?’” They went on to describe the prize itself as “an indication of a bullish confidence that recent technological developments in machine learning and large language models (LLMs) are placing this goal within our grasp.”  After briefly describing the fundamental challenges facing such an effort, “scientists rarely know whether a particular wolf howl, for instance, means something different from another wolf howl, or even whether the wolves consider a howl as somehow analogous to a ‘word’ in human language,” what I would call two rather critical pieces of information for such as an enterprise, they proceeded to declare that “2025 will bring new advances, both in the quantity of animal communication data available to scientists, and in the types and power of AI algorithms that can be applied to those data.”  Six months later, The Guardian covered the same topic, declaring without any real evidence that “We’re close to translating animal languages – what happens then?  AI may soon be able to decode whalespeak, among other forms of communication – but what nature has to say may not be a surprise.” From there, they continued to say much the same as Wired, “This is a race fuelled by generative AI; large language models can sort through millions of recorded animal vocalisations to find their hidden grammars. Most projects focus on cetaceans because, like us, they learn through vocal imitation and, also like us, they communicate via complex arrangements of sound that appear to have structure and hierarchy.”  As preliminary, proof of this, they claimed we’ve already translated some dolphin speech into human language,  “The linguistic barrier between species is already looking porous. Last month, Google released DolphinGemma, an AI program to translate dolphins, trained on 40 years of data. In 2013, scientists using an AI algorithm to sort dolphin communication identified a new click in the animals’ interactions with one another, which they recognised as a sound they had previously trained the pod to associate with sargassum seaweed – the first recorded instance of a word passing from one species into another’s native vocabulary.”

While this minor achievement might be true, it should also go without saying that finding an analogue between a sound and a word doesn’t mean animals talk the same way we do by any means or that the entire enterprise isn’t doomed to failure.  Rather than glossing over whether or not animals actually use words in any meaningful sense, we should begin by considering whether they are even capable of speech in the way they are describing it. Even setting aside that they lack the necessary areas in the brain, the specialized vocal chords, and all the rest of our biological adaptations that make what most of us consider speech possible, they do not seem very interested in it to begin with even when they have been taught.  Outside of a cartoon, your puppy isn’t desperate in some fashion to have a conversation with you and only denied the ability because we don’t have a shared language.  We know this because, as the late great linguist Denis Bickerton once pointed out, we can train chimpanzees and apes to learn sign language and it turns out that they are actually pretty good at it, but there’s no lightbulb that goes off in their brain, compelling them to start working on a Shakespearean play, other masterwork, or even to start using it in their own lives.  Beginning in the 1960s, Washoe was the first chimpanzee to be so-taught; she managed to learn hundreds of signs, some 350, and had the ability to combine them into phrases, some entirely new.  In a double-blind study, she was able to identify objects 86% of the time, which might well be better than some humans if those “man on the street” interviews are a representative sample.  She even attempted to teach her adopted son, Loulis, a few signs, but without human intervention, he did not reach her capabilities, much less build upon them.  Other efforts weren’t even as successful.  A decade later, psychologist Herbert Terrace trained Nim Chimpsky (a play on the famed, though fatally flawed in his theories, linguist Noam Chomsky) to use some 125 signs by raising him in a human family, but he never managed to acquire true grammar or syntax.  Instead, Dr. Terrace concluded that his achievements were the result of the “Clever Hans” effect, named after a horse that was said to be able to count, due to a combination of picking up subtle cues from the trainer and the desire to earn a reward for his efforts.  Koko, perhaps the most famous of them all, learned over a thousand signs between the 1970s and 2018, and could understand some spoken English.  Similarly, Kanzi, a bonobo, learned using a keyboard and lexigrams, and was said to spontaneously create multi-word requests, but to this day, the scientific community remains divided on what these studies mean other than our closest cousins being extremely clever, which should not be surprising.  Critics, including Professor Chomsky, continue to insist the effort is entirely reward-based and there is no real language acquired.  Proponents believe we are seeing evidence of at least limited language ability including displacement (referencing things not present), productivity (inventing new signs), and culture transmission (teaching it to others), more on these in a moment.

In the meantime, it’s clear the result is far from Planet of the Apes either way, where one ape teaches the entire species how to talk and then proceeds to take over the world, but why not?  If animals were capable of speech, we can assume the most intelligent and communal among them would be among the most capable, especially given they have some capacity to learn words and simple phrases. Given the utility of speech to people, where we would neither be who we are as individuals or collectively as a species without it, we can certainly imagine that another species would might feel the same given this gift, using it to the best of their ability, and yet they do not. No only does their ability appear to be incredibly limited and doesn’t by most definitions constitute actual speech, they fail to see any value in it except for earning more treats.  While some may object on the grounds that simply because they can’t learn human speech or use it properly, doesn’t mean they don’t have their own speech, perhaps one they prefer above ours. If you are proposing to use AI to translate between the two, however, that implies there must be some equivalence, a mapping between the species communication mechanisms, which would imply that any additional learned facility could be put to use somehow.  After all, we can easily translate between human languages and humans have no problem learning multiple languages with sufficient effort. Presumably, if we met an alien species, we would do the same and would use whatever was better in that new tongue then our own. If primates were capable of real speech, why wouldn’t they take advantage of a learning something entirely new and useful? Furthermore, this goes beyond learning languages from others.   Humans cannot only learn language. We can create them and rather incredibly, this creation process doesn’t appear to require any special instruction even at a young age and even if the participants cannot speak or hear. Most notably, a group of deaf children in Nicaragua in 1977 developed complex sign language entirely on their own.  This was discovered accidently, if you can believe it.  At the time, Nicaragua had no cohesive deaf community or consistent sign language, and the government founded a school in Managua to provide special education to change that.  Initially, the plan was to teach the students spoken language via lip reading, but something entirely unexpected happened.  As the students socialized, they began teaching each other the simple techniques they used to communicate at home, sharing and expanding upon their simple signs.  Initially, the teachers viewed their efforts as a failure, believing the children weren’t learning and were simply mimicking each other, but in 1986, they invited an MIT-trained linguist Judy Kegl to study the situation.  Rather than simple signs learned from home, she found that the children had created complex structures including verb agreement and grammatical rules, meaning they had created a basic language.  The famed psychologist and author Steven Pinker described the result, “The Nicaraguan case is unique in history…We’ve been able to see how it is that children—not adults—generate language. . . . It’s the only time that we’ve seen a language being created out of thin air.”  Incredibly, the language these children created without any instruction went onto become the standard in Nicaragua, NSL, and it is considered entirely distinct from the country’s Spanish speaking roots, not merely a signed version of it.

The difference between what human children did on their own and what primates were capable of with significant instruction brings us back to the earlier construction.  For speech to be truly speech, it requires referencing things not present in the environment, inventing new signs and structures, and teaching it to others.  Even under controlled conditions, our closest cousins have not demonstrated the ability to do so while humans do it without any instruction at all, while their on teachers had no idea what they were up to.  Bickerton and others have taken this to mean that there are deep underlying features and capacities in our brain that make this happen, things not present in animals, and if anything, the supposed breakthrough in dolphins described above proves the point.  The new sound they discovered appeared to refer to something present in their environment, sargassum seaweed, meaning it was a reaction to a stimulus, not an example of real language based on an abstraction, something that wasn’t there and that was merely conjured in the mind at the time.  Bickerton, in particular, has viewed this as a crucial distinction.  In his view and mine, animals clearly communicate about aspects of their environment and some do so in complicated ways.  There are species of primate that use different calls to alert other members of their troop to various types of predator, for example.  When they see a hawk or other hunting bird in the area, they make one sound.  When they see a snake, another.  The other members of the troop respond accordingly, meaning they are able to differentiate between the two messages.  Presumably, wolves do the same when they howl and the dolphins do the same with seaweed along with many other objects, and equally presumably, some of these behaviors are learned rather than instinctual.  At the same time, this is not speech because there is no abstraction.  Speech requires a referent independent of the object, a symbolic representation of what is not actually there.  When a human talks about snakes or hawks, they can speak about one present in the environment, one they saw at a zoo, one they imagined entirely, or the entire class of snakes and hawks without referring to an individual.  This makes language both incredibly rich and also extraordinarily messy.  Even within established human languages there are words that simply do not translate from one to the other.  The German schadenfreude is a classic example.  Literally, it means harm-joy, but the closest English translation requires almost an entire sentence, experiencing joy at another’s suffering.

Taken to the extreme, the philosopher W. V . O. Quine proposed the Theory of the Indeterminacy of Radical Translation, which holds that no two translation manuals between an established and a completely foreign language will be the same.  The idea is both simple and extremely subtle.  Imagine a linguist tasked with developing a translation manual for a culture that has never before been encountered and for which there are no bilingual speakers.  Because the linguist cannot look inside the mind of their subjects, they are limited to what stimulus prompts the subject to utter a word or a phrase, what Quine referred to as “stimulus meaning,”  but this is necessarily an imperfect and incomplete process.  There is no way for the linguist to know if he has captured every possible stimulus that would produce the word or phrase, and considering there are an extremely high – if not near infinite – potential number of prompts for a complex stimulus, both real and imagined, some of which will vary from speaker to speaker even within the same culture.  Therefore, if two linguists were to perform the same translation process, their translations would not necessarily be the same, nor would there be any objective means to determine which was more accurate than the other.  If this is true for human languages that would presumably follow some of the same rules given they originate in minds organized in a similar fashion, it would seem insurmountable for an interspecies language, even if we assume the species in question was capable of actual speech.  Consider a bat, whose primary sense is echolocation, using sound waves similar to the way we use sight.  How does the concept of stimulus meaning even apply when we cannot access the same stimulus virtually by definition?  This points to yet another challenge with the enterprise.  Wired references the rapidly expanding set of animal recordings as being critical to the translation effort, noting “Automated recording of animal sounds has been placed in easy reach of every scientific research group, with low-cost recording devices such as AudioMoth exploding in popularity.  Massive datasets are now coming online, as recorders can be left in the field, listening to the calls of gibbons in the jungle or birds in the forest, 24/7, across long periods of time,” but the sounds alone are not enough for a translation.  The stimulus that prompts the sounds is required, even for basic communication rather than full fledged speech.

In other words, the experts are claiming that we can translate animal speech that doesn’t really exist in the same sense as human speech and should more properly be described as stimulus based communication, and they are proposing to do so with only half the necessary data, the sounds alone without the show, so to speak.  To me, this suggests that they are not entirely serious and have other motives in mind.  Beyond once again demonstrating what can only be described as an extreme ignorance about the natural world and our own unique place in it, this appears to be yet another incarnation of the all animals are sentient philosophy and will inevitably be used to criticize human civilization and our impact on the environment.  Thus, it wasn’t surprising when The Guardian chose to emphasize this very point in the very same article, claiming that “in the excitement we should not ignore the fact that other species are already bearing eloquent witness to our impact on the natural world. A living planet is a loud one. Healthy coral reefs pop and crackle with life. But soundscapes can decay just as ecosystems can. Degraded reefs are hushed deserts…Where it counts, we are perfectly able to understand what nature has to say; the problem is, we choose not to. As incredible as it would be to have a conversation with another species, we ought to listen better to what they are already telling us.”  In other words, they will tell us the animals are insisting we are killing them and destroying the planet, and we need to stop, but what else is new?

Leave a comment