Data has the power to do good, but data also has the power to do bad. We often forget that there are people are behind data. There are both people who control data and people who are represented by it. Those in control over data also have power over people, which is a problem when data is used to influence and uphold structural and enduring biases against minority groups.
This oppression caused by the collection, analysis and communication of data is what Catherine D’Ignazio and Lauren F. Klein seek to challenge with their book: “Data Feminism”. Data feminism recognizes that data and power are not equally distributed, seeks to understand how data contributes to the disparities experienced in everyday life and challenges the distribution of power. Data feminism confronts the status quo when working with data by thinking about data relative to the direct experience of those who are oppressed, committing to action and embracing intersectionality. Further, data feminism is not only about women. This fight requires more than one gender, and more than just the issue of gender bias, but rather the issues of all bias, to challenge the system.
So how do we do it? Throughout “Data Feminism,” Catherine and Lauren provide us with seven guiding principles:
- Examine Power
- Challenge Power
- Elevate Emotion and Embodiment
- Rethink Binaries and Hierarchies
- Embrace Pluralism
- Consider Context
- Make Labor Visible
Let’s dive into these.
When it comes to data science, men dominate the field. When dominant groups have control of shaping data that impact minority groups, the results that come out of this are biased. The problem is that dominant groups suffer from “privilege hazard”. People who live in privileged positions do not see the problems that minority groups face, and therefore are “surprised” by bias and cannot account for it in their work. This brings the worry that bias may be hard coded into machine learning and AI, and is part of the root cause that needs to be targeted; we need to make sure we have equitable data to work with.
This means we need to have complete data to work with. There is so missing data, which is necessary for solving pivotal issues, yet even data collection is compromised by imbalances in power. Data does not provide equal representation; it is not neutral at all. Further, this scarcity bias means that data science typically serves the goals of those in power, not the goals of minorities. Whether it be for profit, surveillance or efficiency, data costs money and resources, which only those in power have.
Therefore, examining power is the starting point, because we need to understand who works with data, who benefits from it and most importantly, who gets hurt.
Data has always been biased towards minorities because of those who have power over the data. Therefore, we must push back against unequal power structures in order to create equitable futures.
There are four key steps to taking action. First, counterdata (missing data) must be provided to show the oppression data has resulted in. Second, analyze how bias is disguised within data through other factors that may not necessarily be race, and expose how oppression is then amplified through data science to limit people’s future potential. Third is to imagine a goal of co-liberation. A goal that benefits both dominant and minority groups, from the beginning, instead of trying to remove bias from pre-existing structures that are instilled with prejudice. The fourth and final action in challenging power is to teach. Data science is mainly taught to men, not women or minorities, and they are mainly taught technical skills, not how it applies to solving ethical questions that are relevant to people’s everyday lives.
Therefore, those who use data in every day work need to alter basic assumptions when it comes to working with data, imagine new starting points and must learn and teach future data scientists.
Elevate Emotion and Embodiment:
Data visualization is taught to be minimalistic and lack emotion so that the viewers can interpret the data for themselves, however, minimal visualization can represent a partial perspective that does not communicate the whole story. Rhetoric is already present in all design, even neutral design, so it’s impossible to avoid interpretation even of the simplest charts; what you choose to include or highlight in a chart all influence how the data is interpreted.
The is issue is that knowledge is situated, meaning that it is impossible to encompass all knowledge perspectives in one chart or visualization, because people have different standpoints. While it is impossible to accomplish universal objectivity, it is possible to disclose one’s positionalities in order to be transparent about one’s knowledge claims.
So how can we embody emotion to communicate data better? Data visceralization is the representation of data in a form that can be experienced by the whole body physically and emotionally. It tunes into other sensory aspects of the body increasing accessibility for different learning types. Data performances allow one to experience bias rather than just seeing it. Not to mention, this helps designers visualize aspects of data that they may never be able to put in a chart. Emotion expands the data toolbox while providing context and challenging data bias.
Rethink Binaries and Hierarchies:
One of the biggest data collection fails is not accounting for non-binary gender, a crucial piece of missing data, which leads to inaccurate representations of gender. Data must be classified in someway as it is essential to an infrastructure, the problem is that many classifications have hidden non-binary and hierarchical systems behind them that are so natural, no one seems to question them.
Non-binary data must be collected and included to fairly represent people’s, but the visibility that comes along with being counted also puts people who are non-binary are risk for different kinds of discrimination. So what can be done? This has raised many questions when it comes to counting people of a non-binary gender, and that maybe they should only be counted when given consent. Individuals should be able to refuse being counted in light of potential harm. Further, when counting any classification of minoritized groups, one should always balance the harms and benefits relative to those groups.
Of course, counting can also be empowering when communities come together and take over the power of counting to create data. While classification can dominate and exclude people, if binaries and hierarchies are rethought, it can rebalance unequal distributions of power.
To combat oppression, incorporating multiple perspectives into all stages of the data science process is essential from collection to analysis and communication. However, these different voices are typically lost and suppressed in the data cleaning process, which represents 80% of data analysis.
The idea of clean data has a tainted history and is ultimately a “diversity-hiding trick” by removing any outliers that do not fit the “norm”. People will always choose to use cleaner data over more accurate data that represents multiple perspectives, however it is the latter that provides a more complete picture of the issue at hand and provides a better understanding of the world.
How can we embrace pluralism? Work towards the goal of co-liberation, get communities involved and counteract “privilege hazard”. Co-liberation provides a transfer of knowledge and builds a proper social infrastructure. Because data for co-liberation has so many views to represent local knowledge, it may take longer to scale, but it will definitely produce better quality data that encompasses all perspectives.
As datasets have become more easily accessible to the public, which is good, the context of data is very important, so one must ask questions regarding the social, cultural and historical background of the data. Most datasets on the web don’t provide explanations or sources, which can be a problem if the data lead to biased results. We cannot trust to take data at face value and require that there be context.
Sometimes context can be found when analyzing data and discovering what is missing is a powerful insight in itself. Communicating context is also important. When bias is clearly evident in data, it must be clearly communicated in writing as well. For those concerned with working with data, you have to become better at recognizing, naming and talking about the structural forces of oppression and providing context.
While there have been development in tools for context, providing data sheets and user guides to accompany data sets, is this enough? There needs to be as much investment in providing context for data as there in publishing it.
Make Labor Visible:
A major problem with the data supply chain is that it involves a lot of invisible work. While visible labor is rewarded and valued, invisible work, like cooking, cleaning and child-card, is not, because it happens at home or out of sight. Data science profits from unpaid and underpaid work by using crowdsourcing data and paying on-demand labor below minimum wage. This resembles gendered, classed and raced hierarchies that we have so familiarly seen.
So, what can be done? Examining the data production and tracing back to human sources is the first step. Next, crediting work throughout the entire data project life cycle is necessary. Further, crediting emotional and care labor, mainly done by women for children and the elderly, is a must. While attention to care work is increasing due to the presence of more virtual and out of sight jobs, the stress and value of emotional labor continues to go unnoticed. We must work with domestic workers to fight these structural inequalities.
These seven principles outline thoughts, perspectives and actions that are already appearing across groups and organizations that are starting to challenge the status quo when it comes to working with data. However, issues of privilege and power continue to exist. Therefore we must monitor dominant groups, collaborate with communities, embrace different perspectives, collect data properly, make work visible and show work.
While these seven principles are one starting point to end oppression, it is time to keep challenging power and moving forward to an equitable future.