According to Google, 99% of Duo calls have to deal with jumbled or lost packets. A tenth of those calls lose more than 8% of their audio.
Generating speech: To fix the problem, the team built on a neural network developed by DeepMind that can generate realistic speech from text.
Called WaveNetEQ, the new neural network was then trained on a large data set of 100 recorded human voices speaking 48 different languages until it could auto-complete short sections of speech on the basis of common patterns in the way people talk.
Because Duo is end-to-end encrypted, the AI runs on the device, not the cloud. During a call, WaveNetEQ is able to learn characteristics of a speaker’s voice and generates audio snippets that match both the style and content of what the speaker is saying.
When a packet is lost, the AI-generated voice is inserted in its place.
Pixel 4 owners who use Google Duo have unknowingly been testing WaveNetEQ since December. As of Monday, the AI feature began rolling out to several other devices as well. Unfortunately, Google has yet to provide a list of those. What it did deliver this week is an in-depth explanation of how this tech fixes Google Duo audio issues.
Google Duo audio issues most often stem from missing and wrongly ordered data packets.
Audio issues stemming from data packet loss affect 99% of Google Duo calls, according to Alphabet’s subsidiary. While that figure may seem shockingly high, it’s anything but. It is merely a reflection of how imperfect internet-based voice tech continues to be. DeepMind developed WaveNetEQ in response and claims the solution offers unprecedented packet loss mitigation.
In layman’s terms, the service analyzes call data and supplements it with predictive audio when necessary. The AI-created audio information replaces not just lost packets, but also wrongly ordered ones. DeepMind’s platform stems from WaveRNN, an efficient neural audio synthesis solution first conceptualized in a science paper two years ago. WaveRNN itself was preceded by WaveNet, a deep neural network touted as a major breakthrough in AI-infused audio processing back in 2017.
Ultra-complex AI isn’t the only way to combat audio issues on internet calls. However, DeepMind claims it delivers unparalleled results with gaps longer than 60 milliseconds. Encouraged by initial results, the company’s adamant to continue pursuing this packet loss concealement technique moving forward.