Gender & data: personal concerns with gender assignment processes

I am still quite new in the research world, but if there is something I’ve learned during this past year, it is that nothing is as simple as it may seem at first glance. One of the research aspects that has made me realise this is the gender assignment to names.

On the one hand, I work with bibliometric data; thus, I have huge lists of researchers in which there is information about their names, country affiliations, and other relevant data. On the other hand, I am also interested in gender. Therefore, a first (and of the foremost importance) step consists of assigning gender to those names. However, things are not as simple as assuming that, for instance, every “Sara” is a woman.

First, using the Sara example, how do we know that someone called Sara is a woman? Have we asked them? Is it okay for us to assume someone’s gender? Using names as an indicator for gender is a huge problem. Not only are we taking a leap of faith by assuming a relationship between names and gender, but we are also reinforcing stereotypes that link “Sara” to womanhood. If we acknowledge that gender is a spectrum and that the name Sara can be also used by men and non-binary folks, are the results of a research that assigns gender to names based on stereotypes reliable and ethically responsible?

We could say that when we assign gender to names, we are not looking for the gender of a specific person, but we are looking for patterns, and we are looking at the big picture. Therefore, we are not assuming someone’s gender, but we are finding the probability that a name is assigned to women or men, statistically speaking. And, statistically speaking, data does not show huge numbers of trans and non-binary people in academia. However, they exist and their struggles to fight a firm and rigid idea of gender are real. Plus, data may not show their presence because it is not designed to do so (for instance, if questionnaires only accept the answers Men/Women, or if gender assignment algorithms take a binary approach). It is important to take their existence into account. If not, how are we going to find scientific inner workings that may be a cause of their struggle in academia (because we know they struggle more than cis people in academia)? (Gibney, 2019)

Moreover, algorithms that assign gender to names have another huge problem: Non-Western names. Focusing exclusively on technical issues, they are not able to assign gender to numerous Asian names and work way better with English-speaking contexts. Thus, if we assign gender to 80% of our British names but only to 20% of Indian ones, are the results we are getting representative of those two countries, or only of the former? Furthermore, what happens with people whose name does not belong to the list of “traditional” names of the country they live in? Are we also reinforcing geographical stereotypes every time we use these algorithms?

I obviously do not have the answer to any of these issues. I just want to talk about them and discuss them, as I believe this is the way in which science finds answers to its problems. I also want us researchers to be aware of these issues and take them seriously in our daily research. If there is a limitation that we do not have a solution for yet, it is better to state that in our research than plainly ignore the fact that gender assignment to names has its issues. We could say the goal of this small blog entry is to think and reflect on this together. The comments section is open!

Gibney, E. (2019). Discrimination drives LGBT+ scientists to think about quitting. Nature, 571(7763), 16-18.

Recent news