VALL-E: Microsoft's new zero-shot text-to-speech model can duplicate everyone's voice in three seconds (Damir Yalalov/Metaverse Post)
Damir Yalalov / Metaverse Post:
VALL-E: Microsoft's new zero-shot text-to-speech model can duplicate everyone's voice in three seconds — IN BRIEF — With just a three-second sample of any voice, the transformer-based TTS model VALL-E can produce speech in every voice. — This is a significant advancement in the direction of more natural-sounding TTS systems.
http://www.techmeme.com/230109/p31?utm_source=dlvr.it&utm_medium=blogger#a230109p31
http://www.techmeme.com/230109/p31?utm_source=dlvr.it&utm_medium=blogger#a230109p31