Which Wine Varieties are the Most Affordable? In my recent post about Variation, I used Kaggle’s Wine Reviews data set to explore the variation within wine variety, specifically to find the most common wines, with the most common being Chardonnay. I then looked at the variation of Chardonnay prices and ultimately found that the outliers in the data set may have been entered by error. A great example of why exploratory data analysis is needed for any project. Now, instead of looking at the variation in one variable, in this post I want to use the data to see the … Continue reading Exploratory Data Analysis: Covariation of a Categorical and a Continuous Variable
On June 19th, I attended the R-Ladies Lightning Talks event, a series of 5 minute talks presented by several members of the R-Ladies community. This event provided a great way to learn about some of the organizations members, and how … Continue reading R-Ladies Lightning Talks
Missing Migrants Project In the past few years, there has been a huge refugee crisis worldwide as migrants leave their home countries in hopes of better lives. They leave their homes for a variety of reasons from natural disasters, poverty, violence and war to being dissatisfied with the state of their country. Unfortunately, not all migrants make it to their destination as many go missing and die along the way. This led to the Missing Migrants Project which tracks the deaths of migrants, including refugees, who have gone missing along major migration routes worldwide. Data like this is important because … Continue reading Exploratory Data Analysis: Covariation of Two Categorical Variables
Freedom and Happiness This week I came across the World Happiness Report, an annual survey which represents 156 of the world’s countries and how happy the citizens of these countries perceive themselves to be. The report calculates positive and negative emotion based on six key explanatory factors: social support, freedom, corruption, generosity, GDP and life expectancy. I want to explore the correlation between freedom, the freedom to make life choices, and positive affect, the measure of positive emotion. More specifically, I want to see which countries have the happiest citizens and which countries do not based on the freedom that … Continue reading Exploratory Data Analysis: Covariation of Two Continuous Variables
A Look at Variation in Wine Data If you read my last post on Exploratory Data Analysis, then you know that there are many ways to explore a data set. And if you haven’t read it yet, pause and Click Here to read it. This post covers how looking at the variation within a variable can reveal interesting information about the data that you are working with. Variation specifically looks at how the values of a variable changes from one measurement to another. Because of this, each variable will have its own unique pattern, and the only way to see … Continue reading Exploratory Data Analysis: Variation
On Thursday, I attended my first R-Ladies NYC meetup! R-Ladies NYC is an organization that promotes gender diversity amongst the R community by organizing a series of events (including this meetup) to support women who want to learn R or … Continue reading Data Science and Fantasy Baseball?
When working in RStudio there are some workflow basics and tips, when done correctly, that will make data science projects easier to do and follow in the long run. In this post, I will be covering how to assign objects, … Continue reading RStudio: Basics and Tips
Online Search Interest for “Flowers” It’s no surprise that every year, online search interest for “Flowers” is the highest around Mother’s Day as people are looking for the perfect flowers for Mom. Take a look at the time series plot … Continue reading Display Data on a World Map with rworldmap
A Look at the Average Price of Avocados in New York When working on any data science project, you will need to transform data into the proper form to work with. This can include creating new variables, renaming variables or … Continue reading Data Transformation with dplyr
Every data scientist has their preferred choice of programming language to work with. I was first introduced to the R programming language in college and even though it has been 5 years, RStudio has made it very easy to pick up right where I left off. What is RStudio? RStudio is an integrated development environment (IDE) for R. An IDE is a software application that consolidates basic tools for programmers. In the case of RStudio, it provides a user-friendly environment to learn and practice the R programming language. The RStudio IDE has four main sections. The upper left is the … Continue reading What is RStudio?