Sunday, February 21, 2016

Using twitter from the R console

Lately I have been messing around with R and I decided to check out the twitteR package to see if I can post from the R console. In order to use twitter from the R console, we need a couple of things:


  • Setup OAuth  authentication for twitter
  • Install the twitteR package

Setup OAuth  authentication for twitter
As of March 2013 OAuth authentication is required for all Twitter transactions. If you don't already have a OAuth setup, head over to twitter here: https://apps.twitter.com/app/new

Follow the instructions, once you are done, you will see the following 4 items

Consumer Key (API Key)
Consumer Secret (API Secret)

Access Token
Access Token Secret


Install the twitteR package
Now in you R program install the twitteR package
Once the package is installed, it is time to get busy......


 Load the package by executing the following command


library(twitteR)


Now it is time to setup authentication, you do that by using the setup_twitter_oauth command, below is an example, make sure to replace the keys and tokens below with the values you got back when you setup OAuth on twitter



setup_twitter_oauth("API key", "API secret", "Access token", "Access secret")
[1] "Using direct authentication"

If that is all set, we can send a tweet. To update you twitter status, you can use the updateStatus command, this is very simple to use, you pass your status into the function. Here is what it looks like on twitter

updateStatus('testing Tweeting with twitterR package from witin Revolution R Enterprise')
[1] "DenisGobo: testing Tweeting with twitterR package from witin Revolution R Enterprise"

Here is what it looks like from the console


Of course nobody is doing all of this to update their status. The reason I am playing around with this is because I want to do twitter searches and then store the results in a file or database. So let's do a simple search for the tag #rstats and let's also limit the search to only return 6 results

tweets <- searchTwitter('#rstats', n=6) 
tweets

Here is what we got back, as you can see some of the results end in ...., those have been truncated

[1] "psousa75: RT @rquintino: @Mairos_B #sqlsatportugal session: all about R in #SqlServer 2016 #rstats https://t.co/DHrqIZrz1e"

[[2]]
[1] "millerdl: a quick script to use imgcat in #rstats https://t.co/fpUlgWNX33 https://t.co/AhCCMLewCH"

[[3]]
[1] "diana_nario: RT @KirkDBorne: Useful packages (libraries) for Data Analysis in R: https://t.co/haRKopFyly #DataScience #Rstats by @analyticsvidhya https:…"

[[4]]
[1] "emjonaitis: Hey #rstats tweeps, do you have any readings to recommend on sensitivity analysis? Books/articles/websites all welcome."

[[5]]
[1] "caryden: RT @KirkDBorne: A Complete Tutorial on Time Series Modeling in R: https://t.co/7oI6JKyU4E #MachineLearning #DataScience #Rstats by @Analyti…"

[[6]]
[1] "ArkangelScrap: RT @KirkDBorne: A Complete Tutorial on Time Series Modeling in R: https://t.co/7oI6JKyU4E #MachineLearning #DataScience #Rstats by @Analyti…"


What I really want is to convert the output to a data frame. Luckily the twitteR package has this built in, you can use twListToDF. Here is how to do that

tweets <- searchTwitter('#rstats', n=6) 
twListToDF(tweets)

The output now has a lot more stuff, you can see if it has been retweeted or favorited as well as the latitude, longtitude and more


1                             RT @rquintino: @Mairos_B #sqlsatportugal session: all about R in #SqlServer 2016 #rstats https://t.co/DHrqIZrz1e
2                                                      a quick script to use imgcat in #rstats https://t.co/fpUlgWNX33 https://t.co/AhCCMLewCH
3 RT @KirkDBorne: Useful packages (libraries) for Data Analysis in R: https://t.co/haRKopFyly #DataScience #Rstats by @analyticsvidhya https:…
4                      Hey #rstats tweeps, do you have any readings to recommend on sensitivity analysis? Books/articles/websites all welcome.
5 RT @KirkDBorne: A Complete Tutorial on Time Series Modeling in R: https://t.co/7oI6JKyU4E #MachineLearning #DataScience #Rstats by @Analyti…
6 RT @KirkDBorne: A Complete Tutorial on Time Series Modeling in R: https://t.co/7oI6JKyU4E #MachineLearning #DataScience #Rstats by @Analyti…
  favorited favoriteCount replyToSN             created truncated replyToSID
1     FALSE             0        NA 2016-02-20 20:29:54     FALSE         NA
2     FALSE             0        NA 2016-02-20 20:24:50     FALSE         NA
3     FALSE             0        NA 2016-02-20 20:16:25     FALSE         NA
4     FALSE             0        NA 2016-02-20 20:11:08     FALSE         NA
5     FALSE             0        NA 2016-02-20 20:11:06     FALSE         NA
6     FALSE             0        NA 2016-02-20 20:02:05     FALSE         NA
                  id replyToUID
1 701141750161784834         NA
2 701140474019577856         NA
3 701138356466483204         NA
4 701137026075140096         NA
5 701137018508722176         NA
6 701134750296227840         NA
                                                                            statusSource
1                Mobile Web (M5)
2 Tweetbot for Mac
3   Twitter for Android
4                     Twitter Web Client
5     Twitter for iPhone
6                     Twitter Web Client
     screenName retweetCount isRetweet retweeted longitude latitude
1      psousa75            3      TRUE     FALSE        NA       NA
2      millerdl            0     FALSE     FALSE        NA       NA
3   diana_nario           50      TRUE     FALSE        NA       NA
4    emjonaitis            0     FALSE     FALSE        NA       NA
5       caryden           41      TRUE     FALSE        NA       NA
6 ArkangelScrap           41      TRUE     FALSE        NA       NA


Now that we have a dataframe, let's dump it into a csv file. Below is what the command is to write the output to a csv file

write.csv(twListToDF(tweets), file = "c:/temp/Tweets.csv")


Here is what it looks like if you open the csv file in Excel





As you can see each column is filled with correct data. How about instead of writing it into a csv file, we write the data into a database?  That is pretty easy as well, we need the RODBC package to accomplish that. You can see that post here: How to store twitter search results from R into SQL Server



No comments: