Microsoft AI Can Imitate Any Voice Based on Only 3 Seconds of Actual Recording

by Chris Black

This sure is moving fast!

I remember Jordan Peterson having a total breakdown because the AI was able to fake his voice after six hours of him talking.

Well, most people who are not public figures do not have 6 hours of recordings of them talking available.

Everyone has 3 seconds.

VALL-E demo 🤖

Made a quick video about VALL-E, a new speech synthesis (TTS) #AI model by Microsoft. It only requires a 3 second sample to generate audio from text input that is almost indistinguishable from the the speaker's original voice. 🤯

Try 👉 https://t.co/AaPhHCeKO0 pic.twitter.com/U3aYEajxqc

— René Schulte (@rschu) January 12, 2023

Fox News:

Microsoft’s new language model Vall-E is reportedly able to imitate any voice using just a three-second sample recording.

The recently released AI tool was tested on 60,000 hours of English speech data. Researchers said in a paper out of Cornell University that it could replicate the emotions and tone of a speaker.

Those findings were apparently true even when creating a recording of words that the original speaker never actually said.

Leave a Comment Cancel reply