Before you can work with a data set in RStudio, you will first need to import your data into RStudio. In this post I will show exactly how to read in data sets from typical files you may work with including: excel, text and csv files.
Before writing any code, make sure that your data file is saved in the same file path (or folder) as your RStudio file. Otherwise, RStudio will not be able to find it and you will get an error.
Read in Data from an Excel File
Reading in data from an excel file is simple. First, install and load the “readxl” package with the install.packages() and library() functions. The “readxl” package makes it easy to get data out of an excel file and into R and supports both the .xls and .xlsx formats.
Then, you need to assign the excel file name to the name of your data frame using the read_excel() function. Your data frame (df) is the table where you store the data, and can be named anything you want. Here my data frame is called df_xls:
install.packages(“readxl”)
library(readxl)
df_xls <- read_excel(“filename.xls”)
To check your data frame, you can call the head() function to preview the first few rows or call the summary() function to view summary statistics of your data frame:
head(df_xls)
summary(df_xls)
Read in Data from a Text File
R’s read.table() function reads in a file in table format and creates a data frame from it. This can be used to read in data tables from text files. However, to ensure that your variable names (or column names) are identified separately from your data observations, you will need to set “header=TRUE”:
df_txt = read.table(“filename.txt”, header = TRUE)
head(df_txt)
summary(df_txt)
Read in Data from a CSV File
Another common data source is the csv file, also known as the comma separated values file, where data values in the file are separated by commas. To read in a csv file, you use the read.csv() function. Since all of the data values are separated by commas, you will need to split them into separate fields using sep= “,” so that the data values can be read by RStudio.
df_csv = read.csv(“filename.csv”, header = TRUE, sep=”,”)
head(df_csv)
summary(df_csv)
Sometimes, however, your csv file may have values separated by semicolons instead of commas. So, always make sure to pay attention to your data set to see what kind of punctuation is being used. If another kind of punctuation is used, just set the “sep” attribute to that punctuation instead:
df_csv = read.csv(“filename.csv”, header = TRUE, sep=”;”)
Read in Data from an Online CSV File
Lastly, sometimes you may encounter a csv file that you can only open online. It is possible to read in data from online csv files as well, but you will need to first install and load the “RCurl” package which allows you to access data that is online:
install.packages(“RCurl”)
library(RCurl)
Once installed, similar to reading in a csv file, you use the read.csv() function. This time, you will need to use RCurl’s getURL() function within the read.csv() function in order to read in the data from the URL:
df_online_csv = read.csv(text=getURL(“url.csv”))
head(df_online_csv)
summary(df_online_csv)
If you would like to follow a long with examples of reading in excel, text and csv files, please click here for reference code and here for the files.