Create a Twitter bot for your local animal shelter using rtweet

Create an app and a token
GET some pet data
Formulate the tweet
Tweet and automate
Next Steps
Acknowledgements
References

A month or so ago, I came across a cool article about a civic group in Bloomington, IN called BMGhack that coded a Twitter bot in Python for their local animal services department. I’d been reading articles about the earlier twitteR and newer rtweet packages for awhile and thought this was something I could tackle in R. The one I created is for Louisville Metro Animal Services (LMAS), but petfinder.com hosts pet profiles from many organizations. So you should have no problem finding one in your area. To start, let’s go ahead a create the app and get a token.

Create an app and a token

For the most part, everything is explained in Kearney’s tutorial, so I won’t rehash it here. Although, currently there’s a problem with the token that create_token makes. It’s a read-only token, and we need a read and write token in order to tweet using post_tweet.

The good news is that I have a temporary fix until the issue is resolved. After making the token with create_token, you’ll need to regenerate your token at the apps.twitter.com site. Now, you’ll have a read and write token. Then, you replace the old token and secret values with the new token and secret values. Write your new token object to a .rds file in your home directory and Bob’s your uncle.

library(tidyverse)
library(rtweet)



appname <- "<app name>"

key <- "<key>"

secret <- "<secret>"

# create (read-only) token
twitter_token <- create_token(
      app = appname,
      consumer_key = key,
      consumer_secret = secret
)

# Shows Home Directory path
home_directory <- path.expand("~/")

# replace with read and write token and secret values
twitter_token$credentials$oauth_token <- "<new token>"
twitter_token$credentials$oauth_token_secret <- "<new token secret>"

write_rds(twitter_token, "<home directory path>twitter_token.rds")


file_name <- file.path(home_directory, "twitter_token.rds")

# create environment variable so rtweet doesnt need token loaded explicitly in order to work.
cat(paste0("TWITTER_PAT=", file_name),
    file = file.path(home_directory, ".Renviron"),
    append = TRUE,
    fill = TRUE)

GET some pet data

There are two options for getting the information we need: scrape the animal services’ site or use the petfinder.com API. There are pros and cons for each but I’m going to use the API. To access the API, you’ll need a key and to get that key you’ll need to follow the link to the documentation above and create an account.

Once we have our key we need to figure out the shelter ID. The location argument can take a ZIP code or city/state input, and there’s also a XML or JSON option for the response format. On the LMAS website, the phone number and addresses are listed, so I’m going to use the zip codes to narrow the field. We’re left with three entries and from here, we see our shelter ID is “KY102”. So, now we can use the shelter.getPets method to download our pet data. I’ll go into a little more detail about the code later in the post.

library(jsonlite)
library(httr)

URL <- "http://api.petfinder.com/shelter.find"
args <- list(key = "<key>", location = "Louisville, KY", format = "json", count = "200")
api_json <- GET(url = URL, query = args)

# lets you know if any errors occurred in the GET request
stop_for_status(api_json)

# creates character vector that's needed for fromJSON
content_json <- content(api_json, as = "text", encoding = "UTF-8")

# creates list of nested data.frames
obj_json <- fromJSON(content_json)

# Find shelter, 
flatten(obj_json$petfinder$shelters$shelter) %>% 
      rename_at(vars(ends_with(".$t")), ~str_replace(., "\\.\\$t", "")) %>% 
      filter(zip == "40205" | zip == "40218") %>% 
      select(name, email, id)

                              name                                   email    id
1         Pit Bulls of St. Francis           adopt@pitbullsofstfrancis.org KY507
2 Louisville Metro Animal Services AnimalServicesAdoption@louisvilleky.gov KY102
3       Kentucky Great Dane Rescue       KentuckyGreatDaneRescue@Yahoo.com KY404

Next, we use our shelter id to pull some pet data, but you’ll notice I’m only pulling 2 records. This is for EDA purposes. In the final script, I recommend you go to your shelter’s website to get an estimate of how many pets they have for adoption. After you have an estimate, you can set the count argument to a number greater than the records available for your shelter and it won’t be an issue. LMAS has profiles for around 120 pets, so I ended up setting my count equal to 200.

URL <- "http://api.petfinder.com/shelter.getPets"
args <- list(key = "<key>", id = "KY102", format = "json", output = "full", count = "2")
api_json <- GET(url = URL, query = args)

# lets you know if any errors occurred in the GET request
stop_for_status(api_json)

# creates character vector that's needed for fromJSON
content_json <- content(api_json, as = "text", encoding = "UTF-8")

The data I’ll be using for this post are available at my github: content_json_eda and content_json. prettify lets us investigate the nested structure of the JSON we pulled from the API. I’m only showing the top part as these structures can be quite large and complex. Here we can see the meat of the data, with which we want to work, starts three levels down, petfinder –> pets –> pet.

# example data; character vector
content_json_eda <- read_rds("vector_json.rds") # 2 records
content_json <- read_rds("vector_json2.rds") # 200 records


# Get a sense of the nested structure.
content_json_eda %>% prettify

{
    "@encoding": "iso-8859-1",
    "@version": "1.0",
    "petfinder": {
        "@xmlns:xsi": "http://www.w3.org/2001/XMLSchema-instance",
        "lastOffset": {
            "$t": "2"
        },
        "pets": {
            "pet": [
                {
                    "options": {
                        "option": {
                            "$t": "altered"
                        }
                    },
                    "status": {
                        "$t": "A"
                    },
                    "contact": {
                          ...

Formulate the tweet

I haven’t experienced any problems with the Petfinder.com API but if errors do occur, they should be captured by the stop_for_status function earlier. If not, you can check obj_json$header$status$message for answers. Definitions for the status messages can be found in the API documentation.

After applying fromJSON and flatten, we now have a dataframe that we can manipulate by familiar data cleaning methods. Most of the cleaning isn’t noteworthy, but I will mention the weights variable that I created. It’s an attempt to give pets that have had longer stays in the shelter a greater opportunity to be noticed by potential adopters. The greater the weight the more likely that pet is to be selected. It’s based on the lastUpdate variable which isn’t defined in the API documentation. I’m assuming it’s the last time the shelter updated the pet’s information, but it could be the last time Petfinder updated their information (or something else entirely). If it’s the former and judging by the lack of variation in the status column, it might provide some measure of length of stay. If it’s the latter, the effort could be a complete whiff.

# creates list of nested data.frames
obj_json <- fromJSON(content_json)


# api_message <- obj_json$header$status$message


# flatten creates a df; some cols have ".$t" in their names; more heavily weights pets that have been in shelter longer so those pets get more greater opportunity to be seen
pet_df <- flatten(obj_json$petfinder$pets$pet) %>%
      rename_at(vars(ends_with(".$t")), ~str_replace(., "\\.\\$t", "")) %>%
      mutate(lastUpdate = as.POSIXct(lastUpdate),
             link = paste0("https://www.petfinder.com/petdetail/", id),
             sex = recode(sex, "F" = "Female", "M" = "Male"),
             size = recode(size, "L" = "Large", "S" = "Small",
                           "M" = "Medium", "XL" = "Extra Large"),
             status = recode(status, "A" = "Adoptable", "H" = "Hold",
                             "P" = "Pending", "X" = "Adopted/Removed"),
             name = lettercase::str_title_case(tolower(name)),
             elapsedTime = round(Sys.time() - lastUpdate, 0),
             rank = rank(elapsedTime)
      ) %>%  
      mutate(weights = portfolio::weight(., in.var = "rank", type = "linear", sides = "long", size = "all")) %>% 
      sample_n(size = 1, weight = weights)


glimpse(pet_df)

## Observations: 1
## Variables: 26
## $ options.option     <chr> "altered"
## $ status             <chr> "Adoptable"
## $ contact.phone      <chr> "(502) 473-7387"
## $ contact.state      <chr> "KY"
## $ contact.address2   <chr> "3705 Manslick Road Intake Facility"
## $ contact.email      <chr> "AnimalServicesAdoption@louisvilleky.gov"
## $ contact.city       <chr> "Louisville"
## $ contact.zip        <chr> "40218"
## $ contact.address1   <chr> "3516 Newburg Road Adoption Center"
## $ age                <chr> "Senior"
## $ size               <chr> "Medium"
## $ media.photos.photo <list> [<c("pnt", "fpm", "x", "pn", "t"), c("http...
## $ id                 <chr> "41246124"
## $ shelterPetId       <chr> "A637255"
## $ breeds.breed       <list> [["Beagle"]]
## $ name               <chr> "Willy"
## $ sex                <chr> "Male"
## $ description        <chr> NA
## $ mix                <chr> "no"
## $ shelterId          <chr> "KY102"
## $ lastUpdate         <dttm> 2018-03-23
## $ animal             <chr> "Dog"
## $ link               <chr> "https://www.petfinder.com/petdetail/41246124"
## $ elapsedTime        <time> 45 days
## $ rank               <dbl> 99.5
## $ weights            <dbl> 0.01230349

Looking at the columns in pet_df, you can see some of the classes are lists and potentially need unnesting. options.option can have values that give information about whether the pet has been neutered (“altered”), housebroken, good with other animals, etc.. breeds.breed can have multiple values if the animal is mixed breed.

Since unnesting these columns can result in different numbers of rows, the unnesting needs to occur in different code chunks. Also, the column names depend upon whether the column is a nested list or not. For example, if breeds.breed is nested, there will be a $t appended to column name. As long as these columns are non-NA, they’re added to bot_df which will be the primary dataframe we’ll use to create the tweet.

# Dataframe to add columns to
bot_df <- pet_df %>% 
      select(`pet type` = animal, age, sex, size, link)

# Different colnames depending on nrows unnested

if(!is.na(pet_df$options.option)) {
      pet_options <- pet_df %>% 
            select(options.option) %>% 
            unnest %>% 
            rename_at(vars(matches("\\$t")), ~str_replace(., "\\$t", "options")) %>%
            rename_at(vars(matches("options.option")), ~str_replace(., "options.option", "options")) %>% 
            summarize(misc = glue::collapse(options, sep = ", "))
      
      bot_df <- bot_df %>% 
            bind_cols(pet_options) %>% 
            select(`pet type`, misc, everything())
}

if(!is.na(pet_df$breeds.breed)) {
      pet_breeds <- pet_df %>% 
            select(breeds.breed) %>% 
            unnest %>% 
            rename_at(vars(matches("\\$t")), ~str_replace(., "\\$t", "breeds")) %>%
            rename_at(vars(matches("breeds.breed")), ~str_replace(., "breeds.breed", "breeds")) %>% 
            summarize(`breed(s)` = glue::collapse(breeds, sep = ", "))
      
      bot_df <- bot_df %>% 
            bind_cols(pet_breeds) %>% 
            select(`pet type`, `breed(s)`, everything())
}

glimpse(bot_df)

## Observations: 1
## Variables: 7
## $ `pet type` <chr> "Dog"
## $ `breed(s)` <chr> "Beagle"
## $ misc       <chr> "altered"
## $ age        <chr> "Senior"
## $ sex        <chr> "Male"
## $ size       <chr> "Medium"
## $ link       <chr> "https://www.petfinder.com/petdetail/41246124"

select_if keeps columns without values out of our message template, so the tweet doesn’t display a “NA” when a column is vacant. On a side note, I have really started to love these special-case dplyr functions (-if, -at, -all). They’ve become such mental energy savers for me. I recommend you have Suzan Baert’s indispensable dplyr tutorial series bookmarked because there are some syntax quirks when using these functions that I don’t find completely intuitive.

I had never worked with glue before this project, but wow, what a cool package. Here, I’ve layered two different templates by collapsing the first template, adding another column, and using glue_data again to create my final message. If you’re unfamiliar with the package, there are two similar functions, glue and glue_data. The only difference being that glue_data looks for variables in its first argument instead of the calling environment. So for example, glue_data won’t pay attention to any name variable defined outside of my pipeline, because bot_df is its first argument.

Also, you should be aware that Petfinder.com requires including “Powered by Petfinder.com” in your app. It’s a small price to pay in order to help some pets find good homes.

message <- bot_df %>% 
      select_if(~!is.na(.)) %>% 
      gather %>% 
      mutate(spec = glue::glue_data(., "
                                    {key}: {value}
                                    ")) %>% 
      summarize(spec = glue::collapse(spec, sep = "\n")) %>% 
      add_column(name = pet_df$name) %>% 
      mutate(message = glue::glue_data(., "
                                  {name} is:
                                  {spec}
                                  #adoptdontshop #rescue #adoptme #shelterpets
                                  Powered by Petfinder.com
                                  ")) %>% 
      select(message)

head(message)

##                                                                                                                                                                                                                       message
## 1 Willy is:\npet type: Dog\nbreed(s): Beagle\nmisc: altered\nage: Senior\nsex: Male\nsize: Medium\nlink: https://www.petfinder.com/petdetail/41246124\n#adoptdontshop #rescue #adoptme #shelterpets\nPowered by Petfinder.com

You can’t have a pet app and not have an oversized image that unleashes all the cuteness. Gotta tug on those heart strings. Unfortunately, Twitter doesn’t display images from an image link, so we’re going to have to jump through some hoops. Twitter does permit image files to be uploaded, and one of our columns has links to pictures with varying resolutions. We pull the image into our environment from the URL and write it to a temporary file on our disk that will be fed to post_tweet later.

# Some images are blurry on Twitter, but row 3 seems to work alright

if(is.null(pet_df$media.photos.photo[[1]])) {
      # Default photo if no photo is provided
      img_url <- "http://www.dogsinpictures.com/images/dog-cat-bunny-bird-love.jpg"
      image_obj <- magick::image_read(img_url)
      tmp <- tempfile(fileext=".jpg")
      magick::image_write(image_obj, path = tmp, format = "jpg")
}else{
      img_df <- pet_df %>%
            select(media.photos.photo) %>%
            unnest %>%
            slice(3)
      img_url <- img_df$`$t`[[1]]
      
      image_obj <- magick::image_read(img_url)
      tmp <- tempfile(fileext=".jpg")
      magick::image_write(image_obj, path = tmp, format = "jpg")
}

Tweet and automate

We’re set to tweet!

post_tweet(message[[1]], media = tmp)

Louise is:
pet type: Dog
breed(s): Pit Bull Terrier
misc: altered
age: Adult
sex: Female
size: Medium
link: https://t.co/4h99GrVaMJ #adoptdontshop #rescue #adoptme #shelterpets
Powered by https://t.co/T0HSpQ8toe pic.twitter.com/ojgz1JwK42
— Eric (@erbo_exp_bot) April 30, 2018

To be a bot though, it needs to be automated. One way to accomplish this is to set up Windows Task Scheduler. You can create the task through the program itself. In that case, I recommend you check out the bottom of McCann’s post on RPubs. I used the taskscheduleR package which I preferred because of the log file it creates. Task Scheduler has a history tab and a log but it doesn’t show you any useful information if an error is thrown. If you do use taskscheduleR and its add-in, make sure the date format matches your computer’s.

Once your task is initiated. I recommend setting a few options in Task Scheduler. In your task’s general tab, you want to make sure the “run with highest privileges” box is ticked. Otherwise, the UAC box pops-up everytime the task runs. Also, tick “Run whether user is logged on or not” and (if necessary) “do not store password…”. Ticking those boxes will make it so the task runs in the background and doesn’t open a window. There are some other settings worth considering, so you should do some investigating on your own. Also, make sure Rscript.exe has high enough permissions. I gave mine “full control”, but it may not need to be that high.

Lastly, you have to read-in the token and point to it explicitly in post_tweet. Otherwise, for some reason, when the script is executed, lines get written to your .Renviron and bogus copies of your token file are created.

twitter_token <- read_rds("<home directory>twitter_token.rds")

post_tweet(message[[1]], media = tmp, token = twitter_token)

file.remove(tmp)

Session info

Session info --------------------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.5.0 (2018-04-23)
 system   i386, mingw32               
 ui       RStudio (1.1.423)           
 language (EN)                        
 collate  English_United States.1252  
 tz       America/New_York            
 date     2018-05-03                  

Packages ------------------------------------------------------------------------------------------------------------
 package    * version date       source        
 assertthat   0.2.0   2017-04-11 CRAN (R 3.5.0)
 base       * 3.5.0   2018-04-23 local         
 bindr        0.1.1   2018-03-13 CRAN (R 3.5.0)
 bindrcpp   * 0.2.2   2018-03-29 CRAN (R 3.5.0)
 broom        0.4.4   2018-03-29 CRAN (R 3.5.0)
 cellranger   1.1.0   2016-07-27 CRAN (R 3.5.0)
 cli          1.0.0   2017-11-05 CRAN (R 3.5.0)
 colorspace   1.3-2   2016-12-14 CRAN (R 3.5.0)
 compiler     3.5.0   2018-04-23 local         
 crayon       1.3.4   2017-09-16 CRAN (R 3.5.0)
 curl         3.2     2018-03-28 CRAN (R 3.5.0)
 datasets   * 3.5.0   2018-04-23 local         
 devtools     1.13.5  2018-02-18 CRAN (R 3.5.0)
 digest       0.6.15  2018-01-28 CRAN (R 3.5.0)
 dplyr      * 0.7.4   2017-09-28 CRAN (R 3.5.0)
 forcats    * 0.3.0   2018-02-19 CRAN (R 3.5.0)
 foreign      0.8-70  2017-11-28 CRAN (R 3.5.0)
 ggplot2    * 2.2.1   2016-12-30 CRAN (R 3.5.0)
 glue         1.2.0   2017-10-29 CRAN (R 3.5.0)
 graphics   * 3.5.0   2018-04-23 local         
 grDevices  * 3.5.0   2018-04-23 local         
 grid         3.5.0   2018-04-23 local         
 gtable       0.2.0   2016-02-26 CRAN (R 3.5.0)
 haven        1.1.1   2018-01-18 CRAN (R 3.5.0)
 hms          0.4.2   2018-03-10 CRAN (R 3.5.0)
 httr       * 1.3.1   2017-08-20 CRAN (R 3.5.0)
 jsonlite   * 1.5     2017-06-01 CRAN (R 3.5.0)
 knitr        1.20    2018-02-20 CRAN (R 3.5.0)
 lattice      0.20-35 2017-03-25 CRAN (R 3.5.0)
 lazyeval     0.2.1   2017-10-29 CRAN (R 3.5.0)
 lettercase   0.13.1  2016-03-03 CRAN (R 3.5.0)
 lubridate    1.7.4   2018-04-11 CRAN (R 3.5.0)
 magick       1.8     2018-03-19 CRAN (R 3.5.0)
 magrittr     1.5     2014-11-22 CRAN (R 3.5.0)
 memoise      1.1.0   2017-04-21 CRAN (R 3.5.0)
 methods    * 3.5.0   2018-04-23 local         
 mime         0.5     2016-07-07 CRAN (R 3.5.0)
 mnormt       1.5-5   2016-10-15 CRAN (R 3.5.0)
 modelr       0.1.1   2017-07-24 CRAN (R 3.5.0)
 munsell      0.4.3   2016-02-13 CRAN (R 3.5.0)
 nlme         3.1-137 2018-04-07 CRAN (R 3.5.0)
 openssl      1.0.1   2018-03-03 CRAN (R 3.5.0)
 parallel     3.5.0   2018-04-23 local         
 pillar       1.2.1   2018-02-27 CRAN (R 3.5.0)
 pkgconfig    2.0.1   2017-03-21 CRAN (R 3.5.0)
 plyr         1.8.4   2016-06-08 CRAN (R 3.5.0)
 portfolio    0.4-7   2015-01-29 CRAN (R 3.5.0)
 psych        1.8.3.3 2018-03-30 CRAN (R 3.5.0)
 purrr      * 0.2.4   2017-10-18 CRAN (R 3.5.0)
 R6           2.2.2   2017-06-17 CRAN (R 3.5.0)
 Rcpp         0.12.16 2018-03-13 CRAN (R 3.5.0)
 readr      * 1.1.1   2017-05-16 CRAN (R 3.5.0)
 readxl       1.1.0   2018-04-20 CRAN (R 3.5.0)
 reshape2     1.4.3   2017-12-11 CRAN (R 3.5.0)
 rlang        0.2.0   2018-02-20 CRAN (R 3.5.0)
 rstudioapi   0.7     2017-09-07 CRAN (R 3.5.0)
 rtweet     * 0.6.0   2017-11-16 CRAN (R 3.5.0)
 rvest        0.3.2   2016-06-17 CRAN (R 3.5.0)
 scales       0.5.0   2017-08-24 CRAN (R 3.5.0)
 stats      * 3.5.0   2018-04-23 local         
 stringi      1.1.7   2018-03-12 CRAN (R 3.5.0)
 stringr    * 1.3.0   2018-02-19 CRAN (R 3.5.0)
 tibble     * 1.4.2   2018-01-22 CRAN (R 3.5.0)
 tidyr      * 0.8.0   2018-01-29 CRAN (R 3.5.0)
 tidyselect   0.2.4   2018-02-26 CRAN (R 3.5.0)
 tidyverse  * 1.2.1   2017-11-14 CRAN (R 3.5.0)
 tools        3.5.0   2018-04-23 local         
 utils      * 3.5.0   2018-04-23 local         
 withr        2.1.2   2018-03-15 CRAN (R 3.5.0)
 xml2         1.2.0   2018-01-24 CRAN (R 3.5.0)
 yaml         2.1.18  2018-03-08 CRAN (R 3.5.0)

Next Steps

This method should work fine if you’re hosting on a local machine, but for cloud-based hosting, the parts where we write/read the image and token to/from disk may have to be changed so these things are stored in memory. With the current rtweet package, I don’t think the solution is straight-forward. I haven’t delved into it too deeply, but one problem is the image input for post_tweet is limited to a “path”. I did do a cursory examination of the post_tweet function and I think it can modified without too much trouble to accept an image object that’s stored in memory.

As for the token portion of issue, I haven’t looked into it at all. The twitteR package is scheduled for a “leisurely deprecation”, and from my memory of a few articles that used that package, I think there might be some code in there could help. It’s just a hunch though. If anyone finds a solution, I’d be very interested.

Acknowledgements

Thanks to Steve Charlesworth for taking the time to answer some questions regarding his bot.

Thanks to Matthew Gotth-Olsen for liaising with Louisville Metro Animal Services

Special Thanks to

Louisville RStats

References

[1] C. Boettiger. knitcitations: Citations for ‘Knitr’ Markdown Files. R package version 1.0.8. 2017. URL: https://CRAN.R-project.org/package=knitcitations.

[2] C. Brown. lettercase: Utilities for Formatting Strings with Consistent Capitalization, Word Breaks and White Space. R package version 0.13.1. 2016. URL: https://CRAN.R-project.org/package=lettercase.

[3] c, e. =. “jwijffels@bnosac.be";), person and person)). taskscheduleR: Schedule R Scripts and Processes with the Windows Task Scheduler. R package version 1.1. 2017. URL: https://CRAN.R-project.org/package=taskscheduleR.

[4] J. Hester. glue: Interpreted String Literals. R package version 1.2.0. 2017. URL: https://CRAN.R-project.org/package=glue.

[5] M. W. Kearney. rtweet: Collecting Twitter Data. R package version 0.6.0. 2017. URL: https://cran.r-project.org/package=rtweet.

[6] J. Ooms. magick: Advanced Graphics and Image-Processing in R. R package version 1.8. 2018. URL: https://CRAN.R-project.org/package=magick.

[7] J. Ooms. “The jsonlite Package: A Practical and Consistent Mapping Between JSON Data and R Objects”. In: arXiv:1403.2805 [stat.CO] (2014). URL: https://arxiv.org/abs/1403.2805.

[8] H. Wickham. httr: Tools for Working with URLs and HTTP. R package version 1.3.1. 2017. URL: https://CRAN.R-project.org/package=httr.

[9] H. Wickham. tidyverse: Easily Install and Load the ‘Tidyverse’. R package version 1.2.1. 2017. URL: https://CRAN.R-project.org/package=tidyverse.