by Chris Black
This sure is moving fast!
I remember Jordan Peterson having a total breakdown because the AI was able to fake his voice after six hours of him talking.
Well, most people who are not public figures do not have 6 hours of recordings of them talking available.
Everyone has 3 seconds.
VALL-E demo 🤖
Made a quick video about VALL-E, a new speech synthesis (TTS) #AI model by Microsoft. It only requires a 3 second sample to generate audio from text input that is almost indistinguishable from the the speaker's original voice. 🤯
Try 👉 t.co/AaPhHCeKO0 pic.twitter.com/U3aYEajxqc
— René Schulte (@rschu) January 12, 2023
Microsoft’s new language model Vall-E is reportedly able to imitate any voice using just a three-second sample recording.
The recently released AI tool was tested on 60,000 hours of English speech data. Researchers said in a paper out of Cornell University that it could replicate the emotions and tone of a speaker.
Those findings were apparently true even when creating a recording of words that the original speaker never actually said.