WIPO

Royalties in the age of AI: paying artists for AI-generated songs

By Dorien Herremans, Associate Professor, Singapore University of Technology and Design, Lead, Audio, Music and AI Lab (AMAAI)

May 6, 2025

The AI music industry is growing, raising questions around how to protect and pay artists whose work is used to train generative AI models. Are the answers in the models themselves?

The “Illiac Suite” is considered the first piece of music to be composed by an electronic computer. Lejaren Hiller, a professor and composer at the University of Illinois Urbana-Champaign, painstakingly programmed the school’s pioneering computer, the Illiac I, to generate four movements based on algorithmic probabilities. That was in 1956.

Today, with the rise of computing power and generative AI (genAI) technology, it is possible to generate music in your web browser through text prompts alone, all in a matter of seconds. New genAI models such as Suno and Udio can create impressive pieces, with polished melodies, harmonies and rhythms, as well as professionally mastered timbres. However, unlike the Illiac I, these models are trained using pre-existing music written by human hands. Therefore, this newfound ability to generate commercially viable music requires us to rethink how the industry protects and remunerates artists.

With the rise of these genAI systems comes a fundamental question: how do we treat artists fairly?

At the Audio, Music and AI Lab (AMAAI) at the Singapore University of Technology and Design, we’re exploring whether new AI models designed to detect similarities between pieces of music could reveal new ways to distribute royalties. In a musical landscape set to become increasingly dominated by AI, this research could help transform how creators are compensated.

Dorien Herremans, wearing a white shirt, working on a MacBook in a dimly lit room with vertical, evenly spaced wall panels behind her.
Singapore University of Technology and Design
Dorien Herremans.

How do we learn music – the original neural network

Our brains, which are made up of about 86 billion neurons connected by pathways called synapses, are the inspiration for AI models. Throughout our lives, we are exposed to tens of thousands of songs. Our brains implicitly learn patterns and expectations by forming new synaptic connections and strengthening existing ones.

In cognitive science, this process is known as statistical learning. The more we are exposed to certain patterns – such as the common perfect fifth interval (do-sol) in western music – the stronger those connections become. This enables us to form expectations about music. For instance, when we hear a dissonant note that does not belong to a key, it violates our learned expectations, leading us to perceive it as wrong or out of place.

Our understanding of these complex networks remains limited

Our brains do not store entire musical pieces like a recording. Instead, our brains build neural pathways that encode patterns and structures in music. These pathways are what allow us to recognize and anticipate melodies and harmonies. When we hum or compose a song, we are not remembering a given recording but constructing music dynamically based on learned patterns.

How AI music is made

Deep learning networks are based on a similar idea. Artificial neural networks are inspired by human biology, particularly the theory of connectionism, which posits that knowledge emerges from strengthening the connections (synapses) between the brain’s processing units (neurons).

During their training, artificial neural networks are fed thousands of music pieces. They do not store these pieces, but rather learn the statistical relationship between their musical elements, much like our brains learn patterns through exposure.

After training, what remains is not a database of songs but a set of weight parameters that encode the statistical pathways needed to shape musical structure. These weights can be interpreted as the strength of the synapses in the brain. When it is time to generate music, the network performs inference. Given an input – often a text prompt – it samples from the learned statistical distribution to produce new sequences.

However, these weight sets may contain billions of parameters, making them like a black box (an AI system whose internal workings are opaque) that is difficult to interpret. In an attempt to better understand these networks, researchers have developed new techniques such as SHAP (SHapley Additive exPlanations) and LRP (Layer-wise Relevance Propagation), but our understanding of these complex networks remains limited.

Ethical AI music generator from text

This lack of understanding feeds into another issue: the lack of transparency in commercial systems. At the AMAAI Lab, we created Mustango, a controllable open-source text-to-music model like Meta’s MusicGen. But unlike Meta’s model, Mustango was trained exclusively on Creative Commons data.

If a model was trained on music by Taylor Swift and lesser-known artists, should all artists be compensated equally?

Such openness is not the norm in the field. Commercial models such as Suno and Udio have not disclosed their training datasets, nor their model details. This raises important questions about how we should deal with copyright to facilitate ethical AI development in the music industry. This issue is illustrated by recent legal cases such as the Recording Industry Association of America (RIAA) v. Udio and Suno (June 2024).

AI music training detector

Because neural networks – unlike databases – do not store training songs but rather internalize statistical patterns, it is difficult to detect whether particular pieces of music were used to train a model, and because AI companies can easily delete their training data, audits are almost impossible.

At the AMAAI Lab, we are looking into how we can help verify whether models have been trained on particular songs. For this, we are exploring new techniques such as membership inference attacks and perturbation analysis. In the latter, for example, we make tiny changes to a song and observe how the model responds to them. If the model reacts strongly to small changes, it indicates that the AI was exposed to this song during its training.

Licensing music datasets for machine learning

With the rise of these genAI systems comes a fundamental question: how do we treat artists fairly? Unless the courts find merit in the argument that copyrighted music may be used freely to train music because we hear music all around us all the time, commercial genAI systems should properly license the music datasets they use for training.

However, because there is no universal standard licensing mechanism, this would leave smaller startups and academic labs in a pinch. Without access to large datasets, they face significant barriers to training models or making their weights available open- source, thus slowing technological progress. Lacking legal clarity, these groups often cannot take the risk of facing legal action. In addition, acquiring large, legally sound datasets typically requires the kind of substantial up-front investment that precludes smaller tech companies from taking part.

The music industry has to adapt rapidly. We must keep in mind technologies that help us facilitate ethical training practices

Artists’ compensations for use of their music to train AI models

There are other questions that come with designing licensing models too. For example, if a model was trained on a hit song by Taylor Swift as well as songs by lesser-known artists, should all artists be compensated equally? A one-size-fits-all licensing fee may not be fair. A more equitable option could be to use a dynamic mechanism that looks at how much each song contributes to the generated output.

If a user inputs the prompt “create a song like Taylor Swift,” the generated output will be similar to the music of Taylor Swift. In this case, should we consider attribution according to likeness, ensuring that the artist whose music most significantly influences the output is compensated? For this to be possible, we would need technical advancements, including highly accurate similarity models that could help us conceive of such a dynamic and fair attribution model.

Audio embedding models

Natural language processing (NLP) provides the foundation for such similarity-based metrics. Since machine-learning models cannot deal with words directly, we translate them into vectors of numbers before feeding them to any model, a process called embedding. These vectors are essentially multidimensional coordinates, and researchers have discovered from early models such as word2vec that words appearing in similar contexts have similar vector positions, following the distributed semantic hypothesis.

In the field of music, we use a similar embedding process to represent audio. At the AMAAI Lab, we are researching how to fine-tune such embeddings to create meaningful musical similarity metrics that can focus on timbre, melody, harmony, rhythm or even the input prompt itself. Such metrics could also be expanded to detect plagiarism. However, such research remains challenging due to the absence of clearly defined plagiarism rules and datasets.

Enhancing human creativity through generative AI Music

At the 2024 ISMIR (International Society for Music Information Retrieval) conference, keynote speeches such as that by Ed Newton-Rex, founder of Fairly Trained – a non-profit trying to ensure that artists are paid for training data input – added momentum to an outcry over artists’ rights, as well as a call for AI tools that empower music creators rather than replace them. Instead of models designed for pure music generation, AI could focus on enhancing the creative process of composers by acting as collaborative partners, assisting composers with ideas for harmonization, accelerating workflows, infilling short melodic sections and more.

Much like the revolution sparked by the iPod and music streaming, the ongoing AI revolution, which is arguably bigger and more complex, is forcing the music industry to adapt rapidly. In doing so, we must keep in mind technologies that may help us facilitate transparency and ethical training practices.

The first public performance of the “Illiac Suite” in 1956 generated much commotion. One listener “presaged a future devoid of human creativity”. Today’s genAI music models have caused a similar uproar in artistic circles, as well as in the licensing arena. But these amazing new technologies could also lead to the development of collaborative tools that do not undermine but instead enhance artists’ creative processes, as well as ensuring that they get a fair shake.

About the author

Dorien Herremans is an AI music researcher from Belgium and an Associate Professor at the Singapore University of Technology and Design (SUTD), where she leads the Audio, Music and AI Lab (AMAAI). Herremans has worked on automatic music generation and affective computing for many years. Her research has appeared in publications such as Vice Magazine and in French and Belgian national media. Herremans was part of a panel on “AI Output: To Protect or Not to Protect – That Is the IP Question” at the WIPO Conversation forum in November 2024.