Have you ever noticed that when you’re in a crowded room filled with loud voices you’re still able to focus on one person talking to you, and can ignore other simultaneous conversations? Thanks to scientists from the University of California, San Francisco, the mystery of the so called the “cocktail party effect” has been solved.
To understand how selective hearing works in the brain, UCSF neurosurgeon Edward Chang, MD, a faculty member in the UCSF Department of Neurological Surgery and the Keck Center for Integrative Neuroscience, and UCSF postdoctoral fellow Nima Mesgarani, PhD, worked with three patients who were undergoing brain surgery for severe epilepsy.
Part of the surgery involved pinpointing the parts of the brain responsible for disabling seizures. The UCSF epilepsy team found those locations by mapping the brain’s activity over a week, with a thin sheet of up to 256 electrodes placed under the skull on the brain’s outer surface or cortex. These electrodes record activity in the temporal lobe, home to the auditory cortex.
In the experiments, patients listened to two speech samples played to them simultaneously in which different phrases were spoken by different speakers. They were asked to identify the words they heard spoken by one of the two speakers.
The scientists then applied new decoding methods to “reconstruct” what the subjects heard from analyzing their brain activity patterns. The authors found that neural responses in the auditory cortex only reflected those of the targeted speaker. They found that their decoding algorithm could predict which speaker and even what specific words the subject was listening to based on those neural patterns. In other words, they could tell when the listener’s attention strayed to another speaker.
“The combination of high-resolution brain recordings and powerful decoding algorithms opens a window into the subjective experience of the mind that we’ve never seen before,” Chang said. “The algorithm worked so well that we could predict not only the correct responses, but also even when they paid attention to the wrong word.”
The new findings show that the representation of speech in the cortex does not just reflect the entire external acoustic environment but instead just what we really want or need to hear. Revealing how brains are wired to favor some auditory cues over others could even inspire new approaches toward automating and improving how voice-activated electronic interfaces filter sounds in order to properly detect verbal commands, the scientists said.
The authors also pointed out that how the brain can so effectively focus on a single voice may be of interest to the companies that make consumer technologies because of the future market for electronic devices with voice-active interfaces, such as Apple’s Siri.
However, Mesgarani, an engineer with a background in automatic speech recognition research, noted that the engineering required separating a single intelligible voice from a cacophony of speakers and background noise is a surprisingly difficult problem.
Speech recognition, he said, is, “something that humans are remarkably good at, but it turns out that machine emulation of this human ability is extremely difficult.”
The article, “Selective Cortical Representation of Attended Speaker in Multi-Talker Speech Perception,” by Nima Mesgarani and Edward F. Chang, appears in the April 19, 2012 issue of the journal Nature.