Visualizing Text Data from Tweets in a Word Cloud

Labor Day Tweets

In my last post, I covered how to extract tweets from a specific Twitter account and tweets related to a specific Hashtag. If you haven’t read it yet, read it now. But what can you do with this kind of unstructured textual data? One of the first steps to any data analysis is to clean and plot your data, and the best way to visualize textual data is in a word cloud.

A word cloud depicts the frequency of words used in a set of text, where the more common the word, the larger it is represented in the word cloud. The larger a word is in the world cloud, the more important the word is considered to be. It’s these words that give the most insight into what the text is about.

How do word clouds help us understand tweets on Twitter? Well, if we created a word cloud for tweets regarding a specific hashtag, we will be able to find the most common topics being tweeted with that hashtag. This provides insight into the topic and user behavior. Let’s put this into practice.

In honor of Labor Day in the U.S., let’s make a word cloud for tweets with #LaborDay.

Creating a Word Cloud in R

Since we already know how to extract tweets, we can get right into making a word cloud. First, make sure to have the below packages in installed and loaded in R:

#Install Packages
install.packages("tm")
install.packages("wordcloud")
install.packages("RColorBrewer")

#Load Packages
library(tm)
library(wordcloud)
library(RColorBrewer)

Next, we need to extract just the text from our tweets. We can do this with the getText() function in the below code. The sapply() function applies the getText() function to each tweet in our list of Labor Day tweets.

LaborDay_text = sapply(LaborDay, function(x) x$getText())

The next step is to create a corpus; a collection of the texts:

LaborDay_corpus = Corpus(VectorSource(LaborDay_text))

Once we have our corpus, we then need to be able to get the frequency of the words in our corpus. We can do this by creating a Term Document Matrix (tdm) of our corpus. As we do this, we can also clean our text of any punctuation or numbers and turn all the text into lowercase:

tdm = TermDocumentMatrix(LaborDay_corpus,
   control = list(removePunctuation = TRUE, removeNumbers = TRUE, tolower = TRUE))

Lastly, we need get our words and their frequencies into a data frame so that we can create our word cloud.

First, we assign our tdm matrix to variable m; this will make it easier to write code later. Second, we sort the sums of the rows in our matrix to find the word frequencies in decreasing order. Then we create our data frame with words and their frequencies:

#Define tdm as matrix
m = as.matrix(tdm)

#Get word frequency from m in decreasing order
word_frequency = sort(rowSums(m), decreasing=TRUE) 

#Create a data frame with words and their frequencies
LaborDayTweetsFrequency = data.frame(word=names(word_frequency), freq=word_frequency)

And now, we can finally make our word cloud with the wordcloud() function from our wordcloud package:

wordcloud(LaborDayTweetsFrequency$word, LaborDayTweetsFrequency$freq, random.order=FALSE, colors=brewer.pal(8, "Spectral"))

From the word cloud, we can see one clear theme surrounding shopping with keywords like: “giveaway”, “sale”, “egift”, “bfadsamazon” and more. It’s no surprise that this is the dominant theme among #LaborDay tweets as it one of the big sale weekends of the year.

Looking at the finer text in the word cloud, we can see words related to celebration, food and fun; everything needed for a perfect Labor Day weekend.

If you look really closely, you’ll find words like “hurricane” and “florida” referencing Hurricane Dorian which may soon make landfall in Florida. I do wish safety to those who have already been affected or may soon be affected by Hurricane Dorian.

As you can see, a word cloud of tweets can paint a comprehensive picture of a single hashtag. This includes trends that we might expect, to some that we might not. It highlights what is important in the landscape versus topics that are not mentioned as much. Overall, it is a really great way to understand to user behavior. So, I hope you enjoy exploring tweets through word clouds!

Lastly, if you would like to save your word cloud as a PNG, you can do that with the code below:

png("LaborDayTweetsWordCloud.png", width=12, height=8, units="in", res=300)
wordcloud(dm$word, dm$freq, random.order=FALSE, colors=brewer.pal(8, "Spectral"))
dev.off()

Please click here for reference code. Happy Labor Day!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s