Making sense of patent information

December 2016

By Catherine Jewell, Edward Harris and Steven Kelly, Communications Division, WIPO

The patent system is designed both to recognize and reward inventors and to make technological information available to the general public with a view to spurring new innovations. Patent documents contain a huge amount of technological information and are a valuable source of business intelligence.

WIPO has developed a variety of new services and tools to enable innovators and companies to mine this information in support of their own research or business goals. In autumn 2016, it launched two new tools that will make it even easier to search and make sense of the huge volume of patent information generated every year in diverse languages across the globe.

Leading the way in machine translation

WIPO Translate is a ground-breaking translation tool for patent documents based on artificial intelligence which promises to unlock a wealth of previously inaccessible technological information.

Unveiled in late October 2016, the latest version of WIPO Translate employs neural machine translation (NMT) technology to offer innovators the highest-quality service yet available for accessing information about new technologies.

The latest version of WIPO Translate employs neural machine translation (NMT) technology to offer innovators the highest-quality service yet available for accessing information about new technologies (Photo: Farrant).

“One of the aims of the patent system is to make technology available,” explains WIPO Director General Francis Gurry. “Language is often a barrier to achieving that aim universally.”

Mr. Gurry heralded the development of WIPO Translate as “another great step forward” which means that “a vast and ever-increasing trove of patent documents will soon be more easily accessible to innovators who search these records for inspiration or technical know-how" (video on Youtube).

WIPO Translate outperforms existing machine translation tools. It incorporates cutting-edge neural machine translation technology that enables it to convert highly technical patent documents into a second language in a style and syntax that closely mirrors common usage.

Neural machine translation is an emerging technology inspired by the structure and function of brain’s biological neural networks – a radical departure from the phrase-based statistical translation around which most established tools have been developed. With this technology, a large neural network “learns” from previously translated sentences, enabling it to generate highly accurate translations. In the case of WIPO Translate, some 60 million sentences from Chinese patent documents provided by the State Intellectual Property Office of the People’s Republic of China through WIPO’s PATENTSCOPE database were compared with their equivalent official English translation as filed at the United States Patent and Trademark Office.

Emphasis on East Asian languages

The neural version of WIPO Translate complements WIPO’s existing statistical machine-based translation tools, which are available for 16 language pairs. It is currently available in beta mode for translations from Chinese to English, with English to Chinese translation capability to be rolled out shortly. This language pairing was prioritized because of the increasingly high level of patenting activity in China, which in 2015 accounted for around 15 percent of global patenting activity. “As part of a global trend, patent applications are increasingly being filed in East Asian languages, particularly Chinese, and WIPO Translate helps ensure that state-of-the-art knowledge created in these languages is shared as widely and rapidly as possible,” explains Mr. Gurry.

WIPO plans to extend WIPO Translate NMT to patent applications in Japanese, Korean and French, with other languages to follow soon.

How to use WIPO Translate

WIPO Translate is free of charge and available through WIPO’s PATENSCOPE database.To translate a text, simply copy and paste the text you want to translate into the “Text to be translated” box, select the language you require and press the “Translate” button or click the “WIPO Translate” button within PATENSCOPE search results. In either case, the required text is translated instantly.

Patent searches for chemistry and pharmacology made easy

Another new development allows users to mine chemical or pharmacological data from the patent documents contained in PATENTSCOPE.

PATENTSCOPE’s new chemical structure search functionality makes patent searches in the fields of chemistry and pharmacology much easier. Again, it is completely free of charge. It increases the searchability of patent documents, enabling users to search by the name of chemical compounds or by their structure as outlined in drawings embedded in patent applications or patents.

Chemical substances are described in a variety of different ways in patent documents (see Table 1), making it notoriously difficult to undertake patent searches in the fields of chemistry or pharmacology. They may be described using a variety of different naming conventions, or simply by a drawing outlining their chemical structure. Those relating to a pharmaceutical substance may have one or more officially accepted or commonly used names. Undertaking a thorough search of chemical information in patent documents usually involves using a variety of search parameters and sources – each with its own limitations – depending on the purpose of a patent search and the type of information sought.

Main beneficiaries

PATENTSCOPE’s new chemical structure search function will benefit patent examiners and IP professionals, especially those operating in countries with less developed IP services. It will help improve the quality of patent searches of prior art to determine patentability and will strengthen the validity of the patents that are granted. The new search facility will also help IP offices in developing countries handle queries about the patent status of particular medicines, reducing their need to spend limited resources on commercial databases.

Health professionals and procurement agencies that rely on using publicly available patent databases to carry out patent searches, for example to ascertain the IP status of certain commercially available drugs in a given country, also stand to benefit. The increased transparency of the patent system that this new tool enables will also benefit researchers and manufacturers of generic versions of pharmaceuticals wishing to follow new developments and trends in the fields of chemistry and pharmacology.

How to use PATENTSCOPE’s chemical structure search

No specialized knowledge is required to use this new tool. Simply log into your PATENTSCOPE account (which can be created free of charge) and follow the user guide for chemical search available in the “Help” menus under “How to search”.

To search for chemical compounds embedded in patent documents, you have three options:

  1. The “Upload structure” option allows you to upload a file containing a chemical description in a supported format (e.g. MOL, SMILES, or bitmap representations of a chemical compound such a .png, .gif, .tiff or .jpeg).
  2. The “Convert a structure” option allows you to select your preferred type of search parameter. You can search by the various names of the compound, including its commercial name, the name it goes by in the Chemical Abstracts Service (CAS) Registry, its informal name, its international non-proprietary name (INN), its International Chemical Textual identifier (InCHi) identifier, or its simplified-molecular-input-line-entry-system (SMILES) identifier.
  3. The “Structure editor” option allows you to draw or edit a chemical structure, reaction or fragment. These can be sketched out in the same way as they usually are on paper.

The search is then run against the title, abstract, claim and description fields of patent documents in PATENTSCOPE, and only works against developed formulas. The search tool is currently available for international patent applications filed under the PCT in English and German from 1978, and for the national collection of the United States from 1979. It will be available for other languages and collections soon.

Table 1: Examples of search parameters for pharmaceutical substances

Search parameter Example Explanation
Manufacturer name BMS-232632 During the R&D stage, a substance is identified in the laboratory or publications by a code (a combination of letters and numbers).
INN (generic name) atazanavir Each pharmaceutical substance is identified by a unique and universally available designated name.
Brand name Reyataz® Once a drug receives marketing approval, it is sold with a proprietary name registered for trademark protection.
IUPAC chemical name methyl N-[(1S)-1-{[(2S,3S)-3-hydroxy-4-[(2S)-2-[(methoxycarbonyl)amino]-3,3-dimethyl-N'-{[4-(pyridin-2-yl)phenyl]methyl}butanehydrazido]-1-phenylbutan-2-yl]carbamoyl}-2,2-dimethylpropyl]carbamate The International Union of Pure and Applied Chemistry (IUPAC) sets standards for the naming of the chemical elements and compounds in a structured manner.
CAS Registry Number 198904-31-3 Upon publication of chemical literatures and patents, the Chemical Abstracts Service (CAS) assigns a unique numeric identifier to a newly published compound.1
International Patent Classification (IPC) code A61P 31/18 Although IPC codes do not pinpoint a particular substance, a code can be used with other search parameters to narrow down a search result.
Molecular formula C38H52N6O7 This chemical formula shows the number and kinds of atoms in a molecule.
Chemical structure (graphic formula)
Several commercial services offer patent search databases that allow searching compounds by chemical structure in addition to keywords (names) and classification codes. They use various indexing rules so that searchers can also search chemical compounds described in a Markush structure.

1 While there are other organizations that assign identifiers to chemical compounds, the CAS Registry Number is one of the most widely used codes by experts in the field of chemistry.

The WIPO Magazine is intended to help broaden public understanding of intellectual property and of WIPO’s work, and is not an official document of WIPO. The designations employed and the presentation of material throughout this publication do not imply the expression of any opinion whatsoever on the part of WIPO concerning the legal status of any country, territory or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. This publication is not intended to reflect the views of the Member States or the WIPO Secretariat. The mention of specific companies or products of manufacturers does not imply that they are endorsed or recommended by WIPO in preference to others of a similar nature that are not mentioned.