Skip to content

How HomePod uses machine learning to boost far-field Siri accuracy

2018 December 3
by RSS Feed

In a new post published today on its Machine Learning Journal blog, Apple goes on to detail how HomePod, its wireless smart speaker, uses machine learning to increase far-field accuracy, which helps Siri disregard background sounds in noisy environments to understand your requests better.

From the article:

The typical audio environment for HomePod has many challenges—echo, reverberation and noise. Unlike Siri on iPhone, which operates close to the user’s mouth, Siri on HomePod must work well in a far-field setting. Users want to invoke Siri from many locations, like the couch or the kitchen, without regard to where HomePod sits.

A complete online system, which addresses all of the environmental issues that HomePod can experience, requires a tight integration of various multichannel signal processing technologies. Accordingly, the Audio Software Engineering and Siri Speech teams built a system that integrates both supervised deep learning models and unsupervised online learning algorithms and that leverages multiple microphone signals.

The system selects the optimal audio stream for the speech recognizer by using top-down knowledge from ‘Hey Siri’ trigger phrase detectors.

The rest of the article discusses using the various machine learning techniques for online signal processing, as well as the challenges Apple faced and their solutions for achieving environmental and algorithmic robustness while ensuring energy efficiency.

Long story short, Siri on HomePod implements the Multichannel Echo Cancellation (MCEC) algorithm which uses a set of linear adaptive filters to model the multiple acoustic paths between the loudspeakers and the microphones to cancel the acoustic coupling

Due to the close proximity of the loudspeakers to the microphones on HomePod, the playback signal can be significantly louder than a user’s voice command at the microphone positions, especially when the user moves away from the device. In fact, the echo signals may be 30-40 dB louder than the far-field speech signals, resulting in the trigger phrase being undetectable on the microphones during loud music playback.

This means that MCEC alone cannot remove the playback signal completely from your voice command.

Siri voice command recorded in presence of loud playback music: microphone signal (top), output of MCEC (middle) and signal enhanced by Apple’s mask-based echo suppression (bottom)

To remove remaining playback content after the MCEC, HomePod uses a residual echo suppressor approach with a little help from Apple’s machine learning model. For successful trigger phrase detection, the RES does things like mitigate residual linear echo, especially in the presence of double-talk and echo path changes.

Be sure to read the full post and scroll down to Section 7, where you have multiple colorful waveforms along with links below them allowing you to hear for yourself how much of a user’s request gets suppressed by music playing at high volume and the playback signal generated by HomePod’s tweeters and woofer.

HomePod uses machine learning for a lot of things, not just Siri. Content recommendation algorithms run on the device and use machine learning, as do HomePod’s digital audio processing and sound optimization techniques.

Source link: https://www.idownloadblog.com/2018/12/03/how-homepod-uses-machine-learning-to-boost-far-field-siri-accuracy/

Leave a Reply

Note: You may use basic HTML in your comments. Your email address will not be published.

Subscribe to this comment feed via RSS