Saturday, May 21, 2011

Liar brains

Due to the fact that most human languages use sound as the primary medium of expressing meaning ( = getting another's brain to roughly and partially replicate our own brain's patterns of electrochemical activity; putting thoughts in other people's heads), we tend to think of the task of understanding other people's words as primarily an auditory one. People speak, and we listen. True enough (although by no means as simple as it may appear at first glance). However, there's more to speech perception than mere listening. You may be familiar with the McGurk effect:


As this video demonstrates, we rely on more than just listening when comprehending language. We integrate auditory and visual information. When shown a video in which the visual component shows a person saying "ga, ga, ga" (the sound [g] is produced in the back of the mouth, with the back of the tongue touching the soft palate), but the audio component consists of the person saying "ba, ba, ba" (the sound [b] is produced in the front of the mouth, with the lips touching), the information from the two channels gets conflated, and our brain ends up telling us that we're hearing the intermediate sound [d] ("da, da, da"), produced by the tip of the tongue touching the ridge behind our upper front teeth.

OK, so we don't just listen to speech, we also watch it. But, as I mentioned above, listening is not "just listening". One problem that any wetware or a software + hardware system trying to comprehend human speech faces is that speech doesn't come in distinct sound units. While we may think of words such as cat as consisting of separate sounds (in this case a consonant, a vowel, and another consonant), the reality is quite different. If you made a recording of a person saying the word cat and then tried to cut it up into three distinct parts similarly to the way you could cut up the printed version of this word:

c | a | t

you'd be in for a surprise. Wherever you decided to cut, you wouldn't be able to isolate individual sounds. This is because of a phenomenon called coarticulation, the fact that when you talk sounds largely tend to overlap or run into each other, so that you get an overall "smudged" effect, referred to as parallel transmission (meaning that, while we may think we're only saying one sound at a time, we're actually saying multiple sounds simultaneously).

It gets more complicated. We tend to think of the first sound in cut and cat as essentially the same (let's use the symbol /k/ to write down this sound). However, when you use some not too complicated equipment, you realize that the actual physical waves that our brains perceive as /k/ are actually rather different between these two words. Similarly, the first sound of kestrel, cot, and kit, as well as the second sound of skin and the last sound of disk all have distinct physical signatures. They're different sounds! What's more, the /k/ in cat pronounced by a toddler, a female adult, and a male adult are all physically different sounds, too! Somewhat convolutedly, linguists dub this lack of invariance. (Yes, we like to complicate things unnecessarily. Wouldn't variance suffice? Well perhaps it's not as emphatic...) Anyway, the fact remains that our brains are capable of hearing a bunch of acoustically very different signals, deciding that these differences are unimportant and writing them off as such, and then making us believe that we are in fact hearing the same sound, /k/, in all these words.

Not convinced? Well, you don't need a spectrograph (or anything of the sort) to become a believer. The first sound of kit and the second sound of skit are really not the same. A classic way of demonstrating this to beginning students of linguistics is to have them pronounce these words in front of a burning candle or while holding a handkerchief in front of their mouth. Even though your brain tells you that you're hearing /k/ in both cases, the candle and the handkerchief would beg to differ! You can try this and see for yourself.

Interestingly, if you sought out a monolingual speaker of Thai, Danish, or Hindi and asked these people to listen to you saying the English words kit and skit, they would most certainly tell you that (Why, of course!) the first sound of the former and the second sound of the latter word are different sounds! This is because (unlike English) Thai, Danish, and Hindi have words with different meanings that only differ in that one of them has the /k/ of kit and the other the /k/ of skit. However, your Thai, Danish, and Hindi speaker would still hear the first sound of cat and the first sound of cut as the same sound, just like you do.

In other words, your brain lies to you all the time. And mine lies to me. They deceive us into believing that we're hearing sounds we're not really hearing. They also make us believe that physically different sounds are identical and that words consist of individual, separately articulated sound units. What's more, they trick us into believing that speech is a string of individual words, separated by pauses, but the reality is actually quite different. We may mean, "What. Did. You. Say.", but we say, "Whadijasay." We run words together just like we mush sounds together (which becomes patently obvious when you try to make out individual words while listening to a language you don't speak). Try Xhosa, for instance:


Jaseethat?

The brain is a liar. And it lies to us Con. Stan. Tly. Not just when speaking or comprehending speech.

Of course, there are very good reasons that this is so. After all, we need to make sense of the world, respond to it quickly and efficiently, and survive long enough to procreate. (Or eat more chocolate cake. Whatever motivates you!) Imagine how bogged down our brains would get if they weren't able to group phenomena into categories. Imagine barely surviving an encounter with Leopard A only to be completely baffled by the sight of your next leopard, Leopard B, a couple of days later just because Leopard B had longer whiskers, a broken claw,  a different pattern of spots on its coat, or even no spots at all! Clearly, your brain needs to be able to tell you, "This is a leopard. Run!" rather than "Hey, let's check out this kitty cat."

Similarly, your brain needs to deceive you every now and then (or all the time, really) in order to be able to use language efficiently (or indeed at all).

And language is probably the most powerful evolutionary adaptation humans have ever undergone. Language gets you abundant progeny. And a lot of cake.

No comments:

Post a Comment