Artificial Intelligence (AI) is accelerating technological progress, revolutionizing industries, and reshaping the way we interact with each other and the information around us. Advances in AI models, especially large language models, and GenAI, are revolutionizing many areas of our lives and proving to be a powerful tool for how we innovate and create.
Training data is central to the success of all current AI systems. The quality of data directly influences the performance and reliability of AI systems across various applications, and access to diverse training data is also one of the important safeguards against AI bias.
A great proportion of the training data currently used by large language models is collected from publicly available sources, for example, by scaping the Internet. The data sets often contain works such as text, images, designs, and music, which are copyright protected.
The ninth session of the WIPO Conversation provided a platform for deep exploration, aiming to understand the multifaceted relationship between training data and IP. By evaluating current practices, proposing practical solutions, and envisioning future directions, this session fostered a holistic understanding of the impact of training data on the IP landscape.