Tech
Nvidia Unveils AI Model for Music and Audio Generation
Nvidia (NVDA, Financials) has introduced Fugatto, a new artificial intelligence model designed for generating and modifying music, voices, and sounds; the stock is down 3.1% in early morning trading.
The business said Monday that the approach is targeted at professionals in music production, cinematography, and video game creation.
Foundational Generative Audio Transformer Opus, or Fugatto for short, lets users generate or alter audio using hints found in text or audio sources. According to Nvidia, the model can generate totally new sounds, modify instruments in a song, translate written descriptions into musical excerpts, and even change accents or emotions in a speech.
Rafael Valle, Nvidia’s manager of applied audio research, said, We wanted to create a model that understands and generates sound like humans do.
The paradigm has useful applications in many different fields. Advertising firms, for instance, may edit voiceovers with various accents or emotions to fit campaigns for many locations. Fugatto allows makers of video games to dynamically change audio assets in real time to reflect in-game activities.
In proving its adaptability, Nvidia emphasized the model’s capacity to create unusual sound changes, including making a trumpet resemble a barking dog or a saxophone imitate a cat’s meow. Fugatto can also create excellent singing voices from text inputs with minimal fine-tuning and tiny quantities of singing data, despite not being especially trained for such jobs.
Driven by 2.5 billion parameters, Fugatto was created on Nvidia’s DGX systemswhich have 32 H100 Tensor Core GPUs. The business observed that developing the model consumed more than a year of effort.
Nvidia has not said when Fugatto will be accessible for public or commercial usage.
This article first appeared on GuruFocus.