IPCCAT-neural at IPC subgroup level is now cross lingual in 10 languages
October 4, 2019
With the new cross lingual IPCCAT-neural, system for automatic text categorization in the IPC, it is possible to perform automatic classification in the IPC at subgroup level through submission of text which can now be either in Arabic, Chinese, English, French, German, Japanese, Korean, Portuguese, Russian or Spanish, and get an accuracy of the predictions similar to the one in English, i.e. 84% for top-three IPC guesses among 73,633 symbols.
IPCCAT-neural combines approximately 8,000 neural networks, 30 million excerpts of patent documents in English already classified (from the "WIPO EN Delta" dataset) and "WIPO Translate" services to predict the most relevant IPC symbols with an indicative confidence level for each of them.