Data Point Shapes and Colors with ggplot2

Data Visualization in R

A couple of weeks ago, someone on Twitter asked how to change the shape of the circle data points on a plot in R. This made me realize that while I usually change size or color to differentiate data points, I don’t really use different shaped data points in my plots AND I don’t even know what the different shapes are. Therefore, I put together this post to show how to change the shapes and colors of data points in a ggplot2 geom_point() plot (a scatterplot in R) and show you what different shapes you can use.

For this data visualization example, we will be using Kaggle’s World Happiness Report dataset from 2015. Specifically, we will be looking at a scatterplot of Freedom vs. Happiness Scores for the Regions of Western Europe, Central & Eastern Europe, Middle East and Northern Africa and Sub-Saharan Africa. (It would be interesting to see if the relationship between these variables differ for these regions, despite being so near to each other.) Therefore, I have filtered the dataset to only include data for these regions.

ggplot2 is a very popular data visualization package for R, and in combination with it’s geom_point() function, it allows you to easily create a scatterplot of two variables. Let’s take a look at a basic scatterplot of Freedom vs. Happiness Score for the regions indicated above:

ggplot(happinessRegions, aes(Freedom, Happiness.Score)) +
  geom_point() + #creates point plot
  theme_minimal() + #changes plot theme
  ggtitle("Freedom vs. Happiness (2015)") +  #adds title
  labs(x = "Freedom", y = "Happiness Score") #sets axis titles

In the above scatterplot, we can see that there is a positive correlation between Freedom and Happiness, however, we wanted to if this correlation differed for the specific regions that were indicated before. We cannot see the difference among regions in this scatterplot, but we can if we differentiate the data points for the regions by color. This is as simple as assigning a color argument to “Region” within the aes() aesthetic function inside geom_point():

ggplot(happinessRegions, aes(Freedom, Happiness.Score)) +
  geom_point(aes(color = Region)) + #specify color to region within aes()
  theme_minimal() +
  ggtitle("Freedom vs. Happiness (2015)") + 
  labs(x = "Freedom", y = "Happiness Score")

With each region differentiated by color, we have a better understanding of their Freedom and Happiness. From the scatterplot above, we can see that the region indicated by purple, Western Europe, had the highest Freedom and Happiness Score, while the region indicated by blue, Sub-Saharan Africa, had the lowest.

Even with colored data points, when points overlap it is hard to differentiate them; this is where shapes come in. Similar to how we changed the color of the data points in the previous scatterplot, we can also change the shapes by region. Just add in a shape argument as well:

ggplot(happinessRegions, aes(Freedom, Happiness.Score)) +
  geom_point(aes(color = Region, shape = Region)) + #specify shape to region within aes()
  theme_minimal() +
  ggtitle("Freedom vs. Happiness (2015)") + 
  labs(x = "Freedom", y = "Happiness Score")

And so easily, we have changed the shapes of our data points. You will notice that we did not specify what shapes we wanted the data points to be; these were automatically assigned. In R, shapes are assigned to different numbers or symbols, so in order to specify the shapes we want for our data points, we need to specify the number associated with the shape that we want. Let’s take a look at the different shapes that we can use:

If we were working with one group of data, we could just specify the shape number within geom_point(aes()). However, we have four different regions that we are looking at, therefore, we will use scale_shape_manual() to set the shapes of our data points. Let’s change our shapes to 5, 7, 10 and 13:

ggplot(happinessRegions, aes(Freedom, Happiness.Score)) +
  geom_point(aes(shape = Region, color = Region)) +
  theme_minimal() +
  ggtitle("Freedom vs. Happiness (2015)") + 
  labs(x = "Freedom", y = "Happiness Score") +
  scale_shape_manual(values = c(5, 7, 10, 13)) #specify individual shapes

And now we have different shapes! One final thing to do is to assign specific colors. Again, If we were working with one group of data, we could just specify the color within geom_point(aes()), but we don’t have one group. So, similar to how we used scale_shape_manual() to assign shapes, we can use scale_color_manual() to assign colors. When assigning colors beyond basic “red” or “blue”, you can use the HEX number for the colors that you want (you can find these anywhere online.) For now, let’s change them to #C97B84, #A5D8FF, #9CC4B2 and #9532a8:

ggplot(happinessRegions, aes(Freedom, Happiness.Score)) +
  geom_point(aes(shape = Region, color = Region)) +
  theme_minimal() +
  ggtitle("Freedom vs. Happiness (2015)") + 
  labs(x = "Freedom", y = "Happiness Score") +
  scale_shape_manual(values = c(5, 7, 10, 13)) +
  scale_color_manual(values = c("#C97B84", "#A5D8FF", "#9CC4B2", "#9532a8")) #specify individual colors

Now, we have gone through how to change the shapes and colors of data points in a ggplot2 geom_point() plot. I hope this helps for the next time that you want change the shapes and colors of your data points in R. Click Here for reference code.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s