Does AI Get the Kiki Bouba Effect?
Whatever side you’re on in the AI debate (will it take our jobs? can it replace an entire editorial team?) I think we can agree on one thing: LLMs excel at scratching highly specific, nerdy language itches. Say, for example, you want to measure the language in a news article by the Kiki-Bouba effect. Or maybe you want to determine the overall Bouba- or Kiki-ness of the day’s news. Then yeah, AI’s your guy.
(To be fair, my partner and I came up with this while out for beers, so it could be a case of “you had to be there”. Nevertheless, I went ahead and played with it the next day and found the results kind of interesting. I was also interested in the broader question: does AI get Kiki Bouba?)
What in the Bouba-Kiki am I talking about?
The Bouba-Kiki effect is a non-random tendency to associate certain speech sounds with visual shapes, observed in people of all languages and cultures. It’s considered a form of sound symbolism, or ideasthesia – assigning visual and conceptual qualities to a word based on the way it sounds. Kind of like synaesthesia, but instead of the letter A being red (if you know, you know) it’s certain words sounding sharp or rounded, aggressive or friendly.
The theory emerged in 1924, when a Georgian psychologist, Dimitri Uznadze, tested 10 individuals on how they formed associations between a list of nonsense words and a series of drawings. Psychologist Wolfgang Köhler later redid Uznadze’s experiment in his 1929 book, where he showed readers two shapes – a spiky one and a rounded blob – and asked which should be called “takete” and which “maluma”. When the experiment was repeated in 2001 on groups of American and Tamil-speaking Indian students, the names were changed to Kiki and Bouba.
Kiki on the left, Bouba on the right.
Some researchers think sound-shape associations like this one predate the development of language in infants and can be found with all kinds of written languages, not just those that use the Latin alphabet. It’s believed the “softness” and “spikiness” are related to combinations of vowels and consonants, or to the shape one’s mouth makes when they say “kiki” and “bouba”. Rounded, soft (unvoiced) consonants like “b”,“m”, “m”, “l” and longer vowel sounds such as “ooh” and “uh” sound more Bouba, while sharp, staccato consonants like “k”, “t”, “p”, “s” and “z” paired with short, sharp vowels (“ee”, “ih,”, “ay”) are Kiki.
The connotations also apply to names and conceptual meanings. “Molly” generally sounds more Bouba than “Kate”, which is more often associated with Kiki. Kiki words usually call to mind determination, precision, even severity or aggression, while Bouba feels harmless, laid-back and welcoming. On a scale from one to the other, Kiki is “killing” and “nonconsensual” and Bouba is “young”, “pup” and “flowing”.
But just like the semantic field of “food” includes “fork”, “sweet” and the colour beige (aka the tastiest, least nutritious food group), the Kiki-Bouba extended universe can include colours, tastes, textures, genders and celebrity crushes. Back in 2023 TikTok caught wind of the phenomenon and ran with it. In one viral video, content creator Talia Lichtstein explained the general gist of it before diving into which actors were Bouba or Kiki (Robert Pattinson – Kiki, John Hamm – Bouba, in case you were curious). She added she’s always known her type is Bouba men, and now she had a (semi-scientific) explanation for it.
A number of lifestyle sites expanded on the TikTok trend, with Women’s Health announcing that 2023 was the year of Bouba men. High testosterone is “responsible for Kiki-like features such as a strong jaw, heavy brow ridges, and higher cheekbones,” they wrote, while Boubas generally have higher estrogen levels, “which means that on top of having softer features, they also are long-term thinkers, more holistic, and have strong linguistic skills.” They obviously didn’t see Ratboy summer 2024 coming.
The summer of thirsting after Kiki men and…Stewart Little? Are the straights ok?
The New York Times even devised a quiz for subscribers to test their knowledge of the effect. Along with easy sets like cat vs dog and cowboy boots vs Uggs, users were tasked with ranking the Bouba- and Kiki-ness of Beatles members. But the best part was you could see what others responded and how common your answers were.
The idea to ascribe Kiki-Bouba to language is far from new and it’s been contested in the past. Cross-cultural studies showed that speakers of Mandarin Chinese, Turkish, Romanian and Albanian are less likely to make the typical associations and the effect is also less pronounced in visually impaired people who are asked to feel a shape and then rate it. So, this isn’t a universally reliable way to measure how a word feels.
But does AI get it?
Kind of. But only inasmuch as the person writing the prompt does. Because language models are only as creative as the input we use, and the meaning we ascribe to the output.
Models like ChatGPT or DeepSeek have been trained on data found all over the web, so of course they can explain to users what the effect is and how it’s commonly applied. Also, they’re a decent sentiment analysis tool. They can even apply it to texts you want rated for their Kiki- and Bouba-ness.
For example, I fed a random The Guardian news article into both models, asking them to rate only the adjectives based on how Kiki/Bouba they are. The first problem I ran into was neither of the models could properly distinguish between adjectives, adverbs and nouns. That makes sense when you consider LLMs don’t really “know” what words are, since they operate on tokens instead. Tokens are the smallest units of input and output in natural language processing and sometimes they are words, but often they’re word fragments, syllables or isolated characters. A good way of thinking of this is: you can ask a model to give you a summary of the main themes from Catcher in the Rye, but you can’t expect it to correctly count how many letters “r” are in the word “strawberry”.
ChatGPT correctly listed “non-consensual”, “young” and “reckless” as adjectives, but didn’t spot others, like “specific”, at all. It included adverbs like “profoundly” and nouns like “privacy” and “acquaintance” in that category, too. DeepSeek also dropped the ball and mixed adjectives with adverbs, but did a better job of spotting both in the text. Still, a lot of adjectives got overlooked and both LLMs counted the plural noun “survivors” as adjectives. So, I was working in relatively murky waters.
The two lists for Bouba and Kiki adjectives are very different, but when asked to explain their reasoning, the answers are virtually identical. Both models justified their answers as focussing on the “sound, meaning and imagery” (DeepSeek) / the “phonetics, meaning, and shape perception” (ChatGPT) of each adjective (that they could detect). They also factored in syllable structure and rhythm.
DeepSeek’s response
Interestingly, ChatGPT’s answer had a third category: neutral, for adjectives it couldn’t place on the first try. When instructed to nevertheless analyse them and assign a category, it put “legal” and “intentional” in Kiki due to their connotations of precision and hard “t” and “g” sounds. “Privacy” was Bouba because it suggests protection and enclosure.
ChatGPT’s response
Each model had slightly different criteria for what makes a word Kiki or Bouba. ChatGPT for example, counted voiced consonants such as “d” and “g” as Kiki (sharp), while DeepSeek put those in the Bouba (mellow) category. Also, GPT focussed much more on how a word sounds rather than its meaning, so we ended up with some pretty dark or practical Boubas: “homeless”, “commercial”, “survivors” and “acquaintances” (those last two aren’t even adjectives, but nevermind). On the other hand, words that I think would’ve fit better were absent from this category: “profoundly”, “vulnerable” and “due”.
The Kikis were also a mixed bag. DeepSeek’s answer was more complete and closer to what I personally would have put in this category. The reasoning leaned heavily on meaning in context and connotation, less on how a word sounds. Still, both models agreed on the Kiki-est words: “civil”, “repeated”, “non-consensual”, “disturbed”, “difficult” and “penniless”.
Overall, Kiki adjectives dominated the article. When instructed to rank all the words (adjectives or otherwise) and come up with a global score for the whole article, both DeepSeek and ChatGPT rated it overwhelmingly Kiki, with an average of 74%.
So, does this mean AI “understands” the Kiki-Bouba effect? Probably not in the way you and I would. In fact, when I ran the list of actors from Lichtstein’s thirsty TikTok through DeepSeek and asked it to rank thm, I got almost completely different results. The Kikis and Boubas were pretty much reversed, mainly because DeepSeek focussed on the way the names sound and the kinds of roles the actors are best known for. Also, as far as we know, the LLM doesn’t have a “type” and it’ll be a long time before we can get it to get vibes.
I could have followed this up with a prompt to describe each actor in under 50 words, focussing on their appearance (Lichtstein’s criterion in the video) and then try to get them ranked again by Kiki and Bouba. So that’s what I did next – and finally got a similar ranking to the one from the viral TikTok. On a scale from most Kiki to most Bouba, we have: Austin Butler > Robert Pattinson > Zac Efron > Oscar Isaac > Michael B. Jordan > Jon Hamm > Jason Segel. I would’ve switched some of these rankings around (John Hamm is the most Bouba in my opinion), but that’s good enough.
The New York Times would like to know
I did try one more thing, though. I attempted to get DeepSeek to categorise everyday things, such as cats, dogs, spoons and forks, to see if it would follow the trends in this TikTok video. It didn’. Though it did a decent job of singling out the obviously-Kiki (fork, train) from the very-Bouba (spoon, dog), it was missing a lot of the nuance that understands why cats are Kiki 90% of the time.
So, while this was a fun experiment, it reinforced what I already knew: language models are a lot of fun, but they’re only as useful as your prompt. You can’t expect them to “understand” Kiki and Bouba without first explaining in detail what exactly there is to understand here. As Lichtstein put it in her viral TikTok, if you get it, you get it; if you don’t, you don’t. And AI, it seems, mostly doesn’t.