WIPO Speech to Text

WIPO Speech to Text is a powerful transcription tool which automatically converts audio and video content into text using artificial intelligence. It was specifically designed for international meetings and conferences. Once trained in a specific subject area, it out-performs other general purpose transcription tools.

The software was originally created and deployed to help transcribe official WIPO meetings and can be customized for other organisations worldwide.

What are the benefits of WIPO Speech to Text?


The tool can transcribe one hour of video/audio in five minutes when supported by the appropriate IT infrastructure.


The tool uses the latest neural machine learning technology, and is particularly effective in transcribing audio from non-native speakers.  


WIPO Speech to Text can be installed on-site to guarantee security and confidentiality.

What do I need to use it?

  • A GPU-based server infrastructure if you wish to run the tool independently. If not, cloud-based infrastructure will work.
  • For best results, a large collection of domain-specific data in the relevant language can be used to customize the tool.

Which languages does it work with?

WIPO Speech to Text is currently only available in English. We are working on extending the tool to work with other five official United Nations languages (Arabic, Chinese, French, Russian and Spanish).

How can I get it for my organization?

WIPO Speech to Text is available via standard licensing agreements and WIPO's team of experts can even help you to install and set up the tool.

Our user guide gives you a quick walkthrough of how to use our WIPO Speech to Text tool.

Who uses it?

WIPO Speech to Text tool is used by organizations hosting international meetings to help them to accurately and efficiently transcribe the proceedings.