Sentiment Analysis on Twitter Data : Text Analytics Tutorial

05 Tuesday Aug 2014

Posted by DataScientist in Uncategorized

Tags

API, Free, India, R, Sentiment Analysis, Tutorial, Twitter

Sentiment Analysis on Twitter Data

Hello All !

In this tutorial I will help you learn how to do “Sentiment Analysis” on Twitter Data.

Suppose we want to learn about the sentiments (of the users for a particular topic or object such as Bhartiya Janta Party aka BJP, which won with full majority, in recent decades, to form government in center though general elections in India) from the recent tweets on micro blogging site Twitter.

So our search term will be “BJP” and we collect about 1500 tweets from different users using twitter API and different R packages.

As a first step, let us install some important packages for this tutorial purpose. All we need are the following packages:

1. twitteR
2. sentiment
3. plyr
4. ggplot2
5. wordcloud
6. RColorBrewer

First download and install “Rstem” package from here <http://www.omegahat.org/Rstem/> and “tm” package from CRAN <http://cran.r-project.org/web/packages/tm/index.html>.

Then download “sentiment” package from here

< http://cran.r-project.org/web/packages/sentiment/index.html > and install it.

Please note that “Rstem” and “sentiment” packages are no more supported on CRAN (although these can be found in archived packages).

Rest all packages can be installed from CRAN using:

install.packages()

Once you have installed all the requisite packages, as mentioned above, we are good to go.

# Load the necessary packages
> library(twitteR)
> library(wordcloud)
> library(RColorBrewer)
> library(plyr)
> library(ggplot2)
> library(sentiment)

# Find OAuth settings for twitter:
> library(httr)
> oauth_endpoints(“twitter”)

## <oauth_endpoint>
## request: https://api.twitter.com/oauth/request_token
## authorize: https://api.twitter.com/oauth/authenticate
## access: https://api.twitter.com/oauth/access_token

# Register an application (API) at https://apps.twitter.com/
# Once done registering, look at the values of api key, secret and token
# Insert these values below:
> api_key <- “your API key from twitter”
> api_secret <- “your Secret key from twitter”
> access_token <- “you Access Token from twitter”
> access_token_secret <- “you Access Token Secret key from twitter”
> setup_twitter_oauth(api_key,api_secret,access_token,access_token_secret)

## [1] “Using direct authentication”

# Now let us collect some tweets (2000 in our example) containing the term “BJP” from twitter (language = English,
# if you wish you can set other languages to fetch tweets in those languages in your analytics)
> bjp_tweets = searchTwitter(“BJP”, n=2000, lang=”en”)

# fetch the text of these tweets
> bjp_txt = sapply(bjp_tweets, function(x) x$getText())

# Now we will prepare the above text for sentiment analysis
# First we will remove retweet entities from the stored tweets (text)
> bjp_txt = gsub(“(RT|via)((?:\\b\\W*@\\w+)+)”, “”, bjp_txt)
# Then remove all “@people”
> bjp_txt = gsub(“@\\w+”, “”, bjp_txt)
# Then remove all the punctuation
> bjp_txt = gsub(“[[:punct:]]”, “”, bjp_txt)
# Then remove numbers, we need only text for analytics
> bjp_txt = gsub(“[[:digit:]]”, “”, bjp_txt)
# the remove html links, which are not required for sentiment analysis
> bjp_txt = gsub(“http\\w+”, “”, bjp_txt)
# finally, we remove unnecessary spaces (white spaces, tabs etc)
> bjp_txt = gsub(“[ \t]{2,}”, “”, bjp_txt)
> bjp_txt = gsub(“^\\s+|\\s+$”, “”, bjp_txt)
# if anything else, you feel, should be removed, you can. For example “slang words” etc using the above function and methods.

# Since there can be some words in lower case and some in upper, we will try to eredicate this non-uniform pattern by making all the words in lower case. This makes uniform pattern.

# Let us first define a function which can handle “tolower error handling”, if arises any, during converting all words in lower case.
> catch.error = function(x)
{
# let us create a missing value for test purpose
y = NA
# try to catch that error (NA) we just created
catch_error = tryCatch(tolower(x), error=function(e) e)
# if not an error
if (!inherits(catch_error, “error”))
y = tolower(x)
# check result if error exists, otherwise the function works fine.
return(y)
}

# Now we will transform all the words in lower case using catch.error function we just created above and with sapply function
> bjp_txt = sapply(bjp_txt, catch.error)

# Also we will remove NAs, if any exists, from bjp_txt (the collected and refined text in analysis)
> bjp_txt = bjp_txt[!is.na(bjp_txt)]

# also remove names (column headings) from the text, as we do not want them in the sentiment analysis
> names(bjp_txt) = NULL

# Now the text is fully prepared (or at least for this tutorial) and we are good to go to perform Sentiment Analysis using this text

# As a first step in this stage, let us first classify emotions
# In this tutorial we will be using Bayes’ algorithm to classify emotion categories
# for more please see help on classify_emotion (?classify_emotion) under sentiment package
> bjp_class_emo = classify_emotion(bjp_txt, algorithm=”bayes”, prior=1.0)
# the above function returns an of bject of class data.frame with seven columns (anger, disgust, fear, joy, sadness, surprise, best_fit) and one row for each document:

# we will fetch emotion category best_fit for our analysis purposes, visitors to this tutorials are encouraged to play around with other classifications as well.
> emotion = bjp_class_emo[,7]

# Replace NA’s (if any, generated during classification process) by word “unknown”
# There are chances that classification process generates NA’s. This is because, sentiment package uses an in-built dataset “emotions”, which containing approximately 1500 words classified into six emotion categories: anger, disgust, fear, joy, sadness, and surprise
# If any words outside this dataset are given, the process will term the words as NA’s
> emotion[is.na(emotion)] = “unknown”

# Similar to above, we will classify polarity in the text
# This process will classify the text data into four categories (pos – The absolute log likelihood of the document expressing a positive sentiment, neg – The absolute log likelihood of the document expressing a negative sentimen, pos/neg – The ratio of absolute log likelihoods between positive and negative sentiment scores where a score of 1 indicates a neutral sentiment, less than 1 indicates a negative sentiment, and greater than 1 indicates a positive sentiment; AND best_fit – The most likely sentiment category (e.g. positive, negative, neutral) for the given text)

> bjp_class_pol = classify_polarity(bjp_txt, algorithm=”bayes”)

# we will fetch polarity category best_fit for our analysis purposes, and as usual, visitors to this tutorials are encouraged to play around with other classifications as well
> polarity = bjp_class_pol[,4]

# Let us now create a data frame with the above results obtained and rearrange data for plotting purposes
# creating data frame using emotion category and polarity results earlier obtained
> sentiment_dataframe = data.frame(text=bjp_txt, emotion=emotion, polarity=polarity, stringsAsFactors=FALSE)

# rearrange data inside the frame by sorting it
> sentiment_dataframe = within(sentiment_dataframe, emotion <- factor(emotion, levels=names(sort(table(emotion), decreasing=TRUE))))

# In the next step we will plot the obtained results (in data frame)

# First let us plot the distribution of emotions according to emotion categories
# We will use ggplot function from ggplot2 Package (for more look at the help on ggplot) and RColorBrewer Package
> ggplot(sentiment_dataframe, aes(x=emotion)) + geom_bar(aes(y=..count.., fill=emotion)) +
scale_fill_brewer(palette=”Dark2″) +
ggtitle(‘Sentiment Analysis of Tweets on Twitter about BJP’) +
theme(legend.position=’right’) + ylab(‘Number of Tweets’) + xlab(‘Emotion Categories’)

# Similary we will plot distribution of polarity in the tweets
> ggplot(sentiment_dataframe, aes(x=polarity)) +
geom_bar(aes(y=..count.., fill=polarity)) +
scale_fill_brewer(palette=”RdGy”) +
ggtitle(‘Sentiment Analysis of Tweets on Twitter about BJP’) +
theme(legend.position=’right’) + ylab(‘Number of Tweets’) + xlab(‘Polarity Categories’)

# Finally, we will now separate the text (the words) according to emotions (categories) and visualize these words with a comparison cloud (using “wordcloud” Package)

# First, separate the words according to emotions
> bjp_emos = levels(factor(sentiment_dataframe$emotion))
> n_bjp_emos = length(bjp_emos)
> bjp.emo.docs = rep(“”, n_bjp_emos)
> for (i in 1:n_bjp_emos)
{
tmp = bjp_txt[emotion == bjp_emos[i]]
bjp.emo.docs[i] = paste(tmp, collapse=” “)
}

# Here is a hick. Please not that there can be words in the emotion categories which you do not want to be.
# Like earlier in this tutorial, where I asked you to remove words such as slangs etc, here also you can remove
# these words specified as stopwords.
# For exaple we take “english” as the word which we want to remove and not be present in the word cloud,
# here how we do that:
> bjp.emo.docs = removeWords(bjp.emo.docs, stopwords(“english”))

# Now let us create a corpus which computes and represent words on corpora (corpora are collections of documents
containing (natural language) text). For more please look at help on Corpora under “tm” package.
> bjp.corpus = Corpus(VectorSource(bjp.emo.docs))
> bjp.tdm = TermDocumentMatrix(bjp.corpus)
> bjp.tdm = as.matrix(bjp.tdm)
> colnames(bjp.tdm) = bjp_emos

# creating, comparing and plotting the words on the cloud
> comparison.cloud(bjp.tdm, colors = brewer.pal(n_bjp_emos, “Dark2”),
scale = c(3,.5), random.order = FALSE, title.size = 1.5)

That’s all folks!

I hope you will have basic understanding in Text Analytics using Twitter data.

In case any questions, please contact me.

Good Luck!!

130 thoughts on “Sentiment Analysis on Twitter Data : Text Analytics Tutorial”

Ratnam Dodda said:

March 31, 2016 at 10:42 am

install.packages(“twitteR”)
install.packages(“RCurl”)
require(twitteR)
require(RCurl)
consumer_key <-''
consumer_secret <- ''
access_token <- ''
access_secret <- ''
setup_twitter_oauth(consumer_key,consumer_secret,access_token,access_secret)

when i was run this code i got an error can u please solve my problem
Error: could not find function "setup_twitter_oauth"

LikeLike

Reply
- krunal said:
  
  April 27, 2016 at 11:56 am
  
  hi you have used naive bayes algorithm , is there any way of doing SVM for this kindo of data
  
  LikeLike
  
  Reply
Kendell A. Piñeros H (@Kendellpineros) said:

May 3, 2016 at 8:31 am

# First we will remove retweet entities from the stored tweets (text)
> bjp_txt = gsub(“(RT|via)((?:\\b\\W*@\\w+)+)”, “”, bjp_txt)

LikeLike

Reply
Kendell A. Piñeros H (@Kendellpineros) said:

May 3, 2016 at 8:32 am

# First we will remove retweet entities from the stored tweets (text)
> bjp_txt = gsub(“(RT|via)((?:\\b\\W*@\\w+)+)”, “”, bjp_txt)

This didnt work, what appears is :Error: unexpected input in “bjp_txt= gsub(“”

LikeLike

Reply
- MANOJ KUMAR said:
  
  May 3, 2016 at 11:29 am
  
  please check back quotes, they may not be UTF8 format … “”
  
  LikeLike
  
  Reply
  - Kendell A. Piñeros H (@Kendellpineros) said:
    
    May 3, 2016 at 9:54 pm
    
    Thanks, i correct that
    
    LikeLike
Kendell said:

May 5, 2016 at 9:21 pm

I get this problem when im trying to run the classify emotion comand

Error: inherits(x, c(“DocumentTermMatrix”, “TermDocumentMatrix”)) is not TRUE.

How do i solve it?

Thanks

LikeLike

Reply
- MANOJ KUMAR said:
  
  May 21, 2016 at 12:28 pm
  
  may be the package updated something. you should see package documentation for that.
  
  LikeLike
  
  Reply
Dilip said:

May 19, 2016 at 3:30 pm

Hi,
While installing “tm” package i get this error.

unable to identify current timezone ‘C’:
please set environment variable ‘TZ’

Please help me to get this issue resolved.

LikeLike

Reply
- MANOJ KUMAR said:
  
  May 21, 2016 at 12:26 pm
  
  TM package is available only in archive and one can install by importing zip file.
  
  LikeLike
  
  Reply
  - Shahnawaz said:
    
    October 18, 2016 at 2:16 pm
    
    tm package also can be install using spain CRAN mirror
    
    LikeLike
rebeen said:

June 1, 2016 at 8:18 pm

dear Manoj please I want to talk with please can you send me email

LikeLike

Reply
Mohammad Daoud said:

June 2, 2016 at 5:48 pm

classify_emotion() : showing error , function not found , but i am already install sentiment package

LikeLike

Reply
Ingrid Tutor said:

July 5, 2016 at 6:22 pm

Creative ideas – Coincidentally , people are looking for a a form , my company came across a template version here http://goo.gl/l92dbY

LikeLike

Reply
Joe said:

October 29, 2016 at 12:08 am

sentiment package is no longer available for R 3.3.1 and R Studio 0.99. Are ther any oda alternatives to this pkg?

LikeLike

Reply
- Isha said:
  
  February 17, 2017 at 2:25 pm
  
  sentimentr package is available in 3.3.2. You can check it.
  
  LikeLike
  
  Reply
- prasad said:
  
  February 22, 2017 at 12:25 pm
  
  alteranative package for sentiment is syuzhet package
  
  LikeLike
  
  Reply
prasad said:

February 22, 2017 at 12:24 pm

HI how to get tweet likes from twittter in R

LikeLike

Reply
Shubham said:

March 12, 2017 at 12:56 pm

i want to do sentiment analysis on donald trump elections and other opinion about him in us etc. I dont know how to do it with R. I need help..

LikeLike

Reply
isabel erpel said:

April 27, 2017 at 6:57 am

Hi i’m working on a MacBook Pro, Studio 3.4.0, and im trying to download tweets from twitter, im working on this code:
cKey <- 'XXXXXX'
cSecret <- 'XXXXXX'
reqURL <-'https://api.twitter.com/oauth/request_token'
accessURL <-'https://api.twitter.com/oauth/access_token'
authURL<-'https://api.twitter.com/oauth/authorize'
Access_Token<-'XXXXX'
Access_Token_Secret<-'XXXXXXXXX'
twitteR:::setup_twitter_oauth(cKey,cSecret,Access_Token,Access_Token_Secret)

tweets <- searchTwitter(#calbuco,n=3000, lang="es")

but the last line gives me the following error Error in searchTwitter(search.string, n = no.of.tweets, lang = "es") :
could not find function "searchTwitter"

Please help!

LikeLiked by 1 person

Reply
- DataScientist said:
  
  April 27, 2017 at 3:24 pm
  
  https://www.rdocumentation.org/packages/twitteR/versions/1.1.9/topics/searchTwitter
  
  Did you install Package ‘twitteR’ ??
  
  LikeLike
  
  Reply
- DataScientist said:
  
  April 27, 2017 at 3:33 pm
  
  library(twitteR)
  
  cKey <- 'xxxxx'
  cSecret <- 'xxxxxx'
  Access_Token <- 'xxxxxxxx'
  Access_Token_Secret <- 'xxxxxxxx'
  
  reqURL <-'https://api.twitter.com/oauth/request_token&#039'
  accessURL <-'https://api.twitter.com/oauth/access_token&#039'
  authURL<-'https://api.twitter.com/oauth/authorize&#039'
  
  setup_twitter_oauth(cKey,cSecret,Access_Token,Access_Token_Secret)
  # in above select 1 to install ".httr-oauth" :
  #########################################################
  # [1] "Using direct authentication"
  # Use a local file ('.httr-oauth'), to cache OAuth access credentials between R sessions?
  
  # 1: Yes
  # 2: No
  
  # Selection: 1
  # Adding .httr-oauth to .gitignore
  #########################################################
  
  tweets <- searchTwitter('#calbuco', n=3000, lang='es')
  
  And I was able to get the tweets…. follow the above
  
  LikeLike
  
  Reply