Nvidia has unveiled a new artificial intelligence model that “can generate, manipulate, and transform audio and music in remarkable ways”can create any combination of music, voices and sounds.”.
Called Fugatto, this generative AI system can create entirely novel sound effects and musical compositions simply from text prompts. But perhaps even more impressively, it can also take existing audio recordings and radically alter them – for example, converting a piano melody into a human vocal line, or changing the accent and emotional tone of a spoken word sample.
“If we think about synthetic audio over the past 50 years, music sounds different now because of computers, because of synthesizers,” Bryan Catanzaro, Nvidia’s VP of applied deep learning research told Reuters.
“I think that generative AI is going to bring new capabilities to music, to video games and to ordinary folks that want to create things.”
This technology has the potential to be a game-changer for music producers, film/video game sound designers, and creative amateurs alike. By automating tasks like voice transformation, sound design, and musical composition, Fugatto could dramatically streamline and enhance audio production workflows across numerous industries.
Like any gen-AI, its reward comes with risk, raising concerns about potential misuse, such as the creation of misinformation or copyright infringement. As a result, Nvidia has not yet made plans to publicly release the Fugatto system, as they continue to evaluate the risks and implications.