Innovation Gender Gap: Creating a World Gender Name Dictionary

Measuring the participation of women in inventing, creating, and innovating activities is paramount in designing effective innovation and intellectual property (IP) inclusive policies. Yet, most national and international innovation and IP data sources lack any type of gender breakdown indicators/measures.

The World Gender Name Dictionary (WGND) helps solve this data gap.

Estimated reading time: 5 minutes
(photo: slavica/E+/Getty Images)

Why is there a need for a World Gender Name Dictionary?

Measuring the gender of creators, inventors, and innovators is an important input to feed innovation policy-making with concrete evidence. Gender-related economic studies point to the lack of appropriate tools and data sources to ascertain the gender of these talented women and men. This is particularly more challenging when seeking innovation data with gender breakdown with acceptable global coverage.

There are several ways to obtain innovation data with gender breakdown.

But the WGND offers several advantages over others. First, it uses current and historical innovation records, which contain rich information, and attributes a gender to the innovator or creator based on their names. Second, WGND has global coverage, allowing for gender attribution over a large majority of the population in the world. Third, it can be consistently applied – retroactively even – while using publicly available resources. This is why most studies on women patenting worldwide make use of name-gender dictionaries.

What is inside a Name-Gender Dictionary?

In a nutshell, a Name-Gender dictionary implies using a list of given names with their most commonly associated gender.

In general, the name-gender dictionary is best used in combination with information on the origin of the innovator or creator, for instance, their nationality. This is mainly because names customs change from country to country. For example, Andrea is typically used for women in the U.S. and men in Italy. As a result, the WGND (and many others dictionaries) provide combinations of name, country, and the most frequent gender.

The same logic applies to languages, as many names customs are consistent within countries speaking the same language. In the same example, Andrea is commonly used for women in all Spanish-speaking countries and men in Italian-speaking regions, such as Lugano in Switzerland. The WGND takes advantage of this feature and propagates name and gender pairs for all countries with the same official languages, which increases its coverage. The latest version of the WGND also provides combinations of name, language, and the most frequent gender.

Even within the same country or language, several names can be used for both women and men. Many name-gender dictionaries account for this in two ways. Sometimes they include a unisex or unknown gender category. Other times they have frequencies of the name and gender. For example, in Spain, the name Andrea is associated with a woman’s name 97.7% of the time and only 2.3% of the time to men. To account for these cases, the latest WGND reports the expected frequency of a name, country, and gender, whenever possible.

Of course, the quality of the gender attribution depends heavily on the quality and coverage of the gender-name dictionary. This is why global coverage of names is essential for a dictionary like this one.

How was the WGND created?

The main challenge faced to attribute gender is to get a gender-name dictionary with worldwide coverage. For this reason, the WGND compiles as many different public and private sources of gender-name dictionaries exist for as many different countries as possible. The WGND amplifies these based on the most frequent official languages spoken in each country to increase the global coverage of the existing national gender-name dictionaries.

The best sources of these names come from national public institutions. Typically, these are social security agencies, bureaus, institutes of statistics, and/or population registrars. Other sources include academic and similar gender studies that share the results of their initiatives. Ad hoc sources can complement these, such as popular names lists by country available through the Web.

One additional source for the name dictionary is to rely on internal records or inside knowledge in building a complementary name-dictionary list. However, this type of source may be difficult to access for the general public. For instance, most United Nations agencies such as WIPO tend to collect information on the list of participants to their Member State Assemblies, meetings, training, and the like. Due to protocols, the names of the listed participants tend to include their honorific titles, including “Mr.” and “Ms.”. National IP offices can compile a similar list by relying on long historical staff records to create their own dictionaries. Another potential source is to rely on translators and international colleagues to revise any blind spots in the dictionaries.

It is worth noting that sources may conflict about the gender of specific names. These are often a few cases, but they have to be handled appropriately. One alternative is to treat them as frequencies, following the example of the name Andrea as mentioned above in Spain.

Building the first World Gender-Name Dictionary and updating it

In 2016, WIPO consolidated the first world gender-name dictionary (WGND) to identify the participation of women inventors, compiling the information from 14 different sources, which, when combined, cover 182 different countries and territories.

These 14 sources – 13 public and one ad-hoc list – totaled 319,785 pairs of names and territories. Given that there is relatively low conflicting gender attribution across countries of the same language, the WGND has an expanded coverage based on official languages.

The first WGND expanded the name-country pairs based on a common language for 12 frequent languages:

  • Arabic,
  • Dutch,
  • English,
  • French,
  • German,
  • Italian,
  • Japanese,
  • Korean,
  • Portuguese,
  • Russian,
  • Spanish, and
  • Chinese.

These expand results for additional 108 different countries.

The final first WGND contains more than 6 million unique name-country pairs covering 182 different countries.

Recently, WIPO revisited the WGND and expanded it based on updated data and additional sources. The second version compiles more than 40 different sources, including 5 million pairs of names and territories, and expands to 100 other languages.

The resulting second version of the WGND proposes more than 26 million names linked to 195 different countries and territories to disambiguate the gender in innovation and IP data naming physical persons.

You can now find, use and work on the latest version of the dictionary in the Gender GitHub Repository.

Other stories you may enjoy

How to measure the Gender Gap in Innovation

There are several ways to get innovation and IP data with gender breakdown. Check which one suits your case better.

What do we know about the Gender Gap in Innovation?

Women innovating, inventing, and creating face constant factors that impede their activities. What can economic research tell us about these and inform gender balance policies?

Related resources

Identifying the gender of PCT inventors

This paper analyzes the gender of inventors in international patent applications. We compile a worldwide gender-name dictionary, which includes 6.2 million names for 182 different countries to disambiguate the gender of PCT inventors.