Exploratory Data Analysis: Covariation of Two Categorical Variables

Missing Migrants Project

In the past few years, there has been a huge refugee crisis worldwide as migrants leave their home countries in hopes of better lives. They leave their homes for a variety of reasons from natural disasters, poverty, violence and war to being dissatisfied with the state of their country. Unfortunately, not all migrants make it to their destination as many go missing and die along the way.

This led to the Missing Migrants Project which tracks the deaths of migrants, including refugees, who have gone missing along major migration routes worldwide.

Data like this is important because it paints a picture of the conditions and trends of migration routes and further, the risks associated with some of these migration routes. To understand just how dangerous life is for migrants and refugees trying to escape their homes, I want to take a look at the top causes of death among the most common migration routes.

To do so, I will use the Missing Migrants Project data set from Kaggle, which contains data from 2014 through June 2017.

To understand the relation between certain causes of death and the corresponding incident region, we will need to visualize the data in order to see their covariation. But how do we visualize two categorical variables?

The best way to plot two categorical variables in R is with the geom_count() function. Click here for reference code. Now, let’s see what this plot looks like:

In the plot above, we can see the top 10 incident regions plotted on the x-axis against the top 5 causes of death on the y-axis. The size of each circle in the plot displays the count of observations(n) that occurred at each combination of x and y values. Covariation appears the strongest where the count of observations is the highest.

We can clearly see which causes of death can be associated with certain regions along migration routes. It’s not surprising that the Mediterranean had the highest incidences of both drowning and presumed drowning, and was even the reason that the Missing Migrant Project started back in October 2013, when 368 migrants died in two shipwrecks of the coast of an Italian island.

While the U.S./Mexico Border also had several instances of drownings, it is also the region known for having the most unknown skeletal remains.

Lastly, in North Africa, we can see that there were several instances of migrants dying from sickness with no access to medication and in vehicle accidents as they migrate.

The data undoubtedly shows the unfortunate faiths of migrants and refugees around the world. That being said, awareness is just the beginning. Hopefully further investigation into data like this can be used to improve conditions and prevent the loss of innocent lives that we witness today.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s