A major barrier to entry in the music industry is production costs. Even once an artist collects the funds, finding a music producer and studio to meet their needs can be extremely challenging. So, what if you could just tell your computer to make the beat you envision? With Google's MusicLM model, generating music from text could be a reality.
Also: The best AI art generators: DALL-E 2 and alternatives
Last week, Google released an academic paper discussing its MusicLM generative AI model that makes music from user's text prompts. The model can make anywhere from a 10-second audio clip to a full song, using as many specific details as you give it. It can also take an existing song and produce it with a different sound.
Also:AI has caused a renaissance of tech industry R&D, says Meta's chief AI scientist
According to the paper, prompts for the AI model can include detailed commands such as, "enchanting jazz song with a memorable saxophone solo and a solo singer" or "Berlin 90s techno with a low bass and strong kick". To see samples of all its different prompts and abilities, you can click here.
Yesterday, Google published a paper on a new AI model called MusicLM.
- Product Hunt (@ProductHunt) January 27, 2023
The model generates 24 kHz music from rich captions like "A fusion of reggaeton and electronic dance music, with a spacey, otherworldly sound. Induces the experience of being lost in space." pic.twitter.com/XPv0PEQbUh
To create the music, the system is trained on a 280,000-hour dataset of unlabeled music that teaches MusicLM to generate long and coherent music at 25 kHz, according to the paper.
AI chatbots and writers can help lighten your workload by writing emails and essays and even doing math. They use artificial intelligence to generate text or answer queries based on user input. ChatGPT is one popular example, but there are other noteworthy chatbots.
Read nowThis isn't Google's, or the industry's, first attempt at an AI song system. OpenAI, the AI research company behind ChatGPT and DALL-E, has its own version, JukeBox, which has yet to be released to the public. Riffusion, a neural network that produces music using images of sound, is already available to the public now.
But according to Google, its new system is better than anything done before: "Our experiments show through quantitative metrics and human evaluations that MusicLM outperforms previous systems such as Mubert and Riffusion, both in terms of quality and adherence to the caption."
So, when will we be able to use this "better than anything out there" AI model? The answer is, unfortunately, not any time soon.
In the paper, Google recognizes the risk that these kinds of models could pose to the misappropriation of creative content and inherent biases present in the training that could affect cultures underrepresented in the training, as well fears over cultural appropriation. For all of these reasons, Google says it has no plans to release models at this point.
Also: ChatGPT is changing everything. But it still has its limits
In recent times, we have seen AI models that pose the risks delineated by Google. With the release of AI-generated art models, such as Lensa's AI Time Machine, artists have been speaking out about having their art being stolen by AI art models without credit or compensation.
At the same time, the sudden interest in AI tools such as ChatGPT has reportedly prompted Google to consider rolling out AI-based products more quickly.