Tuesday, March 26, 2024

Suno: A ChatGPT for Music Is Here

"I’m just a soul trapped in this circuitry.” The voice singing those lyrics is raw and plaintive, dipping into blue notes. A lone acoustic guitar chugs behind it, punctuating the vocal phrases with tasteful runs. But there’s no human behind the voice, no hands on that guitar. There is, in fact, no guitar. In the space of 15 seconds, this credible, even moving, blues song was generated by the latest AI model from a startup named Suno. All it took to summon it from the void was a simple text prompt: “solo acoustic Mississippi Delta blues about a sad AI.” To be maximally precise, the song is the work of two AI models in collaboration: Suno’s model creates all the music itself, while calling on OpenAI’s ChatGPT to generate the lyrics and even a title: “Soul of the Machine.”
 
Online, Suno’s creations are starting to generate reactions like “How the fuck is this real?” As this particular track plays over a Sonos speaker in a conference room in Suno’s temporary headquarters, steps away from the Harvard campus in Cambridge, Massachusetts, even some of the people behind the technology are ever-so-slightly unnerved. There’s some nervous laughter, alongside murmurs of “Holy shit” and “Oh, boy.” It’s mid-February, and we’re playing with their new model, V3, which is still a couple of weeks from public release. In this case, it took only three tries to get that startling result. The first two were decent, but a simple tweak to my prompt — co-founder Keenan Freyberg suggested adding the word “Mississippi” — resulted in something far more uncanny. (...)

Suno uses the same general approach as large language models like ChatGPT, which break down human language into discrete segments known as tokens, absorb its millions of usages, styles, and structures, and then reconstruct it on demand. But audio, particularly music, is almost unfathomably more complex, which is why, just last year, AI-music experts told Rolling Stone that a service as capable as Suno’s might take years to arrive. “Audio is not a discrete thing like words,” Shulman says. “It’s a wave. It’s a continuous signal.” High-quality audio’s sampling rate is generally 44khz or 48hz, which means “48,000 tokens a second,” he adds. “That’s a big problem, right? And so you need to figure out how to kind of smoosh that down to something more reasonable.” How, though? “A lot of work, a lot of heuristics, a lot of other kinds of tricks and models and stuff like that. I don’t think we’re anywhere close to done.” Eventually, Suno wants to find alternatives to the text-to-music interface, adding more advanced and intuitive inputs — generating songs based on users’ own singing is one idea.

OpenAI faces multiple lawsuits over ChatGPT’s use of books, news articles, and other copyrighted material in its vast corpus of training data. Suno’s founders decline to reveal details of just what data they’re shoveling into their own model, other than the fact that its ability to generate convincing human vocals comes in part because it’s learning from recordings of speech, in addition to music. “Naked speech will help you learn the characteristics of human voice that are difficult,” Shulman says. (...)

Rodriguez sees Suno as a radically capable and easy-to-use musical instrument, and believes it could bring music making to everyone much the way camera phones and Instagram democratized photography. The idea, he says, is to once again “move the bar on the number of people that are allowed to be creators of stuff as opposed to consumers of stuff on the internet.” He and the founders dare to suggest that Suno could attract a user base bigger than Spotify’s. If that prospect is hard to get your head around, that’s a good thing, Rodriguez says: It only means it’s “seemingly stupid” in the exact way that tends to attract him as an investor. “All of our great companies have that combination of excellent talent,” he says, “and then something that just seems stupid until it’s so obvious that it’s not stupid.”

Well before Suno’s arrival, musicians, producers, and songwriters were vocally concerned about AI’s business-shaking potential. “Music, as made by humans driven by extraordinary circumstances … those who have suffered and struggled to advance their craft, will have to contend with the wholesale automation of the very dear-bought art they have fought to achieve,” Reid writes. But Suno’s founders claim there’s little to fear, using the metaphor that people still read despite having the ability to write. “The way we think about this is we’re trying to get a billion people much more engaged with music than they are now,” Shulman says. “If people are much more into music, much more focused on creating, developing much more distinct tastes, this is obviously good for artists. The vision that we have of the future of music is one where it’s artist-friendly. We’re not trying to replace artists.”

Though Suno is hyperfocused only on reaching music fans who want to create songs for fun, it could still end up causing significant disruption along the way. In the short term, the segment of the market for human creators that seems most directly endangered is a lucrative one: songs created for ads and even TV shows. Lucas Keller, founder of the management firm Milk and Honey, notes that the market for placing well-known songs will remain unaffected. “But in terms of the rest of it, yeah, it could definitely put a dent in their business,” he says. “I think that ultimately, it allows a lot of ad agencies, film studios, networks, etc., to not have to go license stuff.”

by Bryan Hiatt, Rolling Stone |  Read more:
Image: Harry Campbell
[ed. Link to the text-to-music song Soul of a Machine here. See also: Our AI-Generated Blues Song Went Viral — and Sparked Controversy (Wired):]

Just last summer, experts on the intersection of AI and music told Rolling Stone that it would be years before a tool emerged that could conjure up fully produced songs from a simple text description, given the endless complexities of the finished product. But Suno, a two-year-old start-up based in Cambridge, Massachusetts, has already pulled it off, vocals included — and their latest model, v3, which is available to the general public as of today, is capable of some truly startling results.

In Rolling Stone‘s feature on Suno, part of our latest Future of Music package, we included an unsettling acoustic blues song called “Soul of the Machine,” fully generated by Suno, which uses ChatGPT to write lyrics unless you submit some yourself. The song — generated from the prompt “Mississippi Delta blues song about a sad AI” — went viral, with more than 36,000 plays in four days, and sparked debate over cultural appropriation, Suno’s training data (the precise contents of which they won’t reveal), the technology’s effects on human artists, and more. (...)

He also says he was stunned on a technical level that all of it was generated by AI — “not just the acoustic rural ‘blues’ guitar and the mournful ‘bluesman’s’ vocals, but also the room, ambience, of the simulated recording. No mics. No board. No high-ceiling converted small church transformed into a mobile recording space by a young, committed, Alan Lomax-type character, passionate to preserve vanishing sharecropper songs for posterity. It is not inconceivable that the Alan Lomax archive (and a lot more besides) was raided to train Suno’s AI.” (Suno has declined to reveal details of its training data, though one of its main investors, Antonio Rodriguez, told Rolling Stone that he is prepared for a potential lawsuit from labels and publishers.)