AssetID: 53802761
Headline: RAW VIDEO: AI Headphones Let Wearer Listen To A Single Person In A Crowd By Looking At Them Just Once
Caption: Researchers have developed AI-powered headphones that allow users to focus on a single speaker in a noisy environment by looking at them briefly. By looking at a person for three to five seconds and pressing a button to record their speech, the device filters out all other sounds, even while the user is walking around the room. The system called “Target Speech Hearing,” then cancels all other sounds in the environment and plays just the enrolled speaker’s voice in real-time even as the listener moves around in noisy places and no longer faces the speaker. The University of Washington team presented its findings on May 14 in Honolulu at the ACM CHI Conference on Human Factors in Computing Systems. “We tend to think of AI now as web-based chatbots that answer questions,” said senior author Shyam Gollakota, a UW professor in the Paul G. Allen School of Computer Science & Engineering. “But in this project, we develop AI to modify the auditory perception of anyone wearing headphones, given their preferences. With our devices you can now hear a single speaker clearly even if you are in a noisy environment with lots of other people talking.” To use the system, a person wearing off-the-shelf headphones fitted with microphones taps a button while directing their head at someone talking. The sound waves from that speaker’s voice then should reach the microphones on both sides of the headset simultaneously; there’s a 16-degree margin of error. The headphones send that signal to an onboard embedded computer, where the team’s machine-learning software learns the desired speaker’s vocal patterns. The system latches onto that speaker’s voice and continues to play it back to the listener, even as the pair moves around. The system’s ability to focus on the enrolled voice improves as the speaker keeps talking, giving the system more training data. The team tested its system on 21 subjects, who rated the clarity of the enrolled speaker’s voice nearly twice as high as the unfiltered audio on average. This work builds on the team’s previous “semantic hearing” research, which allowed users to select specific sound classes — such as birds or voices — that they wanted to hear and cancel other sounds in the environment. Currently, the TSH system can enrol only one speaker at a time, and it’s only able to enrol a speaker when there is not another loud voice coming from the same direction as the target speaker’s voice. If a user isn’t happy with the sound quality, they can run another enrollment on the speaker to improve the clarity. The team is working to expand the system to earbuds and hearing aids in the future.
Keywords: feature,photo feature,photo story
PersonInImage: