Training sets for automated text categorization in the IPC
This is the homepage of the WIPO automated categorization datasets. We provide information about the datasets' contents, how get access and links to various relevant background sources of information.
We are currently freely distributing two collections of XML documents that have been manually classified in a complex hierarchical taxonomy known as the International Patent Classification (IPC):
These data collections are made available to the community for research purposes. In this way, we specifically aim to encourage research into the automated categorization of patent documents. Users of the collections are requested to communicate their results to WIPO.
Full details about the content of the WIPO-alpha dataset is found in the WIPO-alpha readme.
If you have any questions or comments, feel free to get in touch with us by email: firstname.lastname@example.org.