A month or so ago, I came across a cool article about a civic group in Bloomington, IN called BMGhack that coded a Twitter bot in Python for their local animal services department. I’d been reading articles about the earlier twitteR
and newer rtweet
packages for awhile and thought this was something I could tackle in R. The one I created is for Louisville Metro Animal Services (LMAS), but petfinder.com hosts pet profiles from many organizations. So you should have no problem finding one in your area. To start, let’s go ahead a create the app and get a token.
Create an app and a token
For the most part, everything is explained in Kearney’s tutorial, so I won’t rehash it here. Although, currently there’s a problem with the token that create_token
makes. It’s a read-only token, and we need a read and write token in order to tweet using post_tweet
.
The good news is that I have a temporary fix until the issue is resolved. After making the token with create_token
, you’ll need to regenerate your token at the apps.twitter.com site. Now, you’ll have a read and write token. Then, you replace the old token and secret values with the new token and secret values. Write your new token object to a .rds file in your home directory and Bob’s your uncle.
library(tidyverse)
library(rtweet)
appname <- "<app name>"
key <- "<key>"
secret <- "<secret>"
# create (read-only) token
twitter_token <- create_token(
app = appname,
consumer_key = key,
consumer_secret = secret
)
# Shows Home Directory path
home_directory <- path.expand("~/")
# replace with read and write token and secret values
twitter_token$credentials$oauth_token <- "<new token>"
twitter_token$credentials$oauth_token_secret <- "<new token secret>"
write_rds(twitter_token, "<home directory path>twitter_token.rds")
file_name <- file.path(home_directory, "twitter_token.rds")
# create environment variable so rtweet doesnt need token loaded explicitly in order to work.
cat(paste0("TWITTER_PAT=", file_name),
file = file.path(home_directory, ".Renviron"),
append = TRUE,
fill = TRUE)
GET some pet data
There are two options for getting the information we need: scrape the animal services’ site or use the petfinder.com API. There are pros and cons for each but I’m going to use the API. To access the API, you’ll need a key and to get that key you’ll need to follow the link to the documentation above and create an account.
Once we have our key we need to figure out the shelter ID. The location argument can take a ZIP code or city/state input, and there’s also a XML or JSON option for the response format. On the LMAS website, the phone number and addresses are listed, so I’m going to use the zip codes to narrow the field. We’re left with three entries and from here, we see our shelter ID is “KY102”. So, now we can use the shelter.getPets method to download our pet data. I’ll go into a little more detail about the code later in the post.
library(jsonlite)
library(httr)
URL <- "http://api.petfinder.com/shelter.find"
args <- list(key = "<key>", location = "Louisville, KY", format = "json", count = "200")
api_json <- GET(url = URL, query = args)
# lets you know if any errors occurred in the GET request
stop_for_status(api_json)
# creates character vector that's needed for fromJSON
content_json <- content(api_json, as = "text", encoding = "UTF-8")
# creates list of nested data.frames
obj_json <- fromJSON(content_json)
# Find shelter,
flatten(obj_json$petfinder$shelters$shelter) %>%
rename_at(vars(ends_with(".$t")), ~str_replace(., "\\.\\$t", "")) %>%
filter(zip == "40205" | zip == "40218") %>%
select(name, email, id)
name email id
1 Pit Bulls of St. Francis adopt@pitbullsofstfrancis.org KY507
2 Louisville Metro Animal Services AnimalServicesAdoption@louisvilleky.gov KY102
3 Kentucky Great Dane Rescue KentuckyGreatDaneRescue@Yahoo.com KY404
Next, we use our shelter id to pull some pet data, but you’ll notice I’m only pulling 2 records. This is for EDA purposes. In the final script, I recommend you go to your shelter’s website to get an estimate of how many pets they have for adoption. After you have an estimate, you can set the count
argument to a number greater than the records available for your shelter and it won’t be an issue. LMAS has profiles for around 120 pets, so I ended up setting my count
equal to 200.
URL <- "http://api.petfinder.com/shelter.getPets"
args <- list(key = "<key>", id = "KY102", format = "json", output = "full", count = "2")
api_json <- GET(url = URL, query = args)
# lets you know if any errors occurred in the GET request
stop_for_status(api_json)
# creates character vector that's needed for fromJSON
content_json <- content(api_json, as = "text", encoding = "UTF-8")
The data I’ll be using for this post are available at my github: content_json_eda and content_json. prettify
lets us investigate the nested structure of the JSON we pulled from the API. I’m only showing the top part as these structures can be quite large and complex. Here we can see the meat of the data, with which we want to work, starts three levels down, petfinder –> pets –> pet.
# example data; character vector
content_json_eda <- read_rds("vector_json.rds") # 2 records
content_json <- read_rds("vector_json2.rds") # 200 records
# Get a sense of the nested structure.
content_json_eda %>% prettify
{
"@encoding": "iso-8859-1",
"@version": "1.0",
"petfinder": {
"@xmlns:xsi": "http://www.w3.org/2001/XMLSchema-instance",
"lastOffset": {
"$t": "2"
},
"pets": {
"pet": [
{
"options": {
"option": {
"$t": "altered"
}
},
"status": {
"$t": "A"
},
"contact": {
...
Formulate the tweet
I haven’t experienced any problems with the Petfinder.com API but if errors do occur, they should be captured by the stop_for_status
function earlier. If not, you can check obj_json$header$status$message
for answers. Definitions for the status messages can be found in the API documentation.
After applying fromJSON
and flatten
, we now have a dataframe that we can manipulate by familiar data cleaning methods. Most of the cleaning isn’t noteworthy, but I will mention the weights
variable that I created. It’s an attempt to give pets that have had longer stays in the shelter a greater opportunity to be noticed by potential adopters. The greater the weight the more likely that pet is to be selected. It’s based on the lastUpdate
variable which isn’t defined in the API documentation. I’m assuming it’s the last time the shelter updated the pet’s information, but it could be the last time Petfinder updated their information (or something else entirely). If it’s the former and judging by the lack of variation in the status
column, it might provide some measure of length of stay. If it’s the latter, the effort could be a complete whiff.
# creates list of nested data.frames
obj_json <- fromJSON(content_json)
# api_message <- obj_json$header$status$message
# flatten creates a df; some cols have ".$t" in their names; more heavily weights pets that have been in shelter longer so those pets get more greater opportunity to be seen
pet_df <- flatten(obj_json$petfinder$pets$pet) %>%
rename_at(vars(ends_with(".$t")), ~str_replace(., "\\.\\$t", "")) %>%
mutate(lastUpdate = as.POSIXct(lastUpdate),
link = paste0("https://www.petfinder.com/petdetail/", id),
sex = recode(sex, "F" = "Female", "M" = "Male"),
size = recode(size, "L" = "Large", "S" = "Small",
"M" = "Medium", "XL" = "Extra Large"),
status = recode(status, "A" = "Adoptable", "H" = "Hold",
"P" = "Pending", "X" = "Adopted/Removed"),
name = lettercase::str_title_case(tolower(name)),
elapsedTime = round(Sys.time() - lastUpdate, 0),
rank = rank(elapsedTime)
) %>%
mutate(weights = portfolio::weight(., in.var = "rank", type = "linear", sides = "long", size = "all")) %>%
sample_n(size = 1, weight = weights)
glimpse(pet_df)
## Observations: 1
## Variables: 26
## $ options.option <chr> "altered"
## $ status <chr> "Adoptable"
## $ contact.phone <chr> "(502) 473-7387"
## $ contact.state <chr> "KY"
## $ contact.address2 <chr> "3705 Manslick Road Intake Facility"
## $ contact.email <chr> "AnimalServicesAdoption@louisvilleky.gov"
## $ contact.city <chr> "Louisville"
## $ contact.zip <chr> "40218"
## $ contact.address1 <chr> "3516 Newburg Road Adoption Center"
## $ age <chr> "Senior"
## $ size <chr> "Medium"
## $ media.photos.photo <list> [<c("pnt", "fpm", "x", "pn", "t"), c("http...
## $ id <chr> "41246124"
## $ shelterPetId <chr> "A637255"
## $ breeds.breed <list> [["Beagle"]]
## $ name <chr> "Willy"
## $ sex <chr> "Male"
## $ description <chr> NA
## $ mix <chr> "no"
## $ shelterId <chr> "KY102"
## $ lastUpdate <dttm> 2018-03-23
## $ animal <chr> "Dog"
## $ link <chr> "https://www.petfinder.com/petdetail/41246124"
## $ elapsedTime <time> 45 days
## $ rank <dbl> 99.5
## $ weights <dbl> 0.01230349
Looking at the columns in pet_df
, you can see some of the classes are lists and potentially need unnesting. options.option
can have values that give information about whether the pet has been neutered (“altered”), housebroken, good with other animals, etc.. breeds.breed
can have multiple values if the animal is mixed breed.
Since unnesting these columns can result in different numbers of rows, the unnesting needs to occur in different code chunks. Also, the column names depend upon whether the column is a nested list or not. For example, if breeds.breed
is nested, there will be a $t
appended to column name. As long as these columns are non-NA, they’re added to bot_df
which will be the primary dataframe we’ll use to create the tweet.
# Dataframe to add columns to
bot_df <- pet_df %>%
select(`pet type` = animal, age, sex, size, link)
# Different colnames depending on nrows unnested
if(!is.na(pet_df$options.option)) {
pet_options <- pet_df %>%
select(options.option) %>%
unnest %>%
rename_at(vars(matches("\\$t")), ~str_replace(., "\\$t", "options")) %>%
rename_at(vars(matches("options.option")), ~str_replace(., "options.option", "options")) %>%
summarize(misc = glue::collapse(options, sep = ", "))
bot_df <- bot_df %>%
bind_cols(pet_options) %>%
select(`pet type`, misc, everything())
}
if(!is.na(pet_df$breeds.breed)) {
pet_breeds <- pet_df %>%
select(breeds.breed) %>%
unnest %>%
rename_at(vars(matches("\\$t")), ~str_replace(., "\\$t", "breeds")) %>%
rename_at(vars(matches("breeds.breed")), ~str_replace(., "breeds.breed", "breeds")) %>%
summarize(`breed(s)` = glue::collapse(breeds, sep = ", "))
bot_df <- bot_df %>%
bind_cols(pet_breeds) %>%
select(`pet type`, `breed(s)`, everything())
}
glimpse(bot_df)
## Observations: 1
## Variables: 7
## $ `pet type` <chr> "Dog"
## $ `breed(s)` <chr> "Beagle"
## $ misc <chr> "altered"
## $ age <chr> "Senior"
## $ sex <chr> "Male"
## $ size <chr> "Medium"
## $ link <chr> "https://www.petfinder.com/petdetail/41246124"
select_if
keeps columns without values out of our message template, so the tweet doesn’t display a “NA” when a column is vacant. On a side note, I have really started to love these special-case dplyr
functions (-if, -at, -all). They’ve become such mental energy savers for me. I recommend you have Suzan Baert’s indispensable dplyr
tutorial series bookmarked because there are some syntax quirks when using these functions that I don’t find completely intuitive.
I had never worked with glue
before this project, but wow, what a cool package. Here, I’ve layered two different templates by collapsing the first template, adding another column, and using glue_data
again to create my final message. If you’re unfamiliar with the package, there are two similar functions, glue
and glue_data
. The only difference being that glue_data
looks for variables in its first argument instead of the calling environment. So for example, glue_data
won’t pay attention to any name
variable defined outside of my pipeline, because bot_df
is its first argument.
Also, you should be aware that Petfinder.com requires including “Powered by Petfinder.com” in your app. It’s a small price to pay in order to help some pets find good homes.
message <- bot_df %>%
select_if(~!is.na(.)) %>%
gather %>%
mutate(spec = glue::glue_data(., "
{key}: {value}
")) %>%
summarize(spec = glue::collapse(spec, sep = "\n")) %>%
add_column(name = pet_df$name) %>%
mutate(message = glue::glue_data(., "
{name} is:
{spec}
#adoptdontshop #rescue #adoptme #shelterpets
Powered by Petfinder.com
")) %>%
select(message)
head(message)
## message
## 1 Willy is:\npet type: Dog\nbreed(s): Beagle\nmisc: altered\nage: Senior\nsex: Male\nsize: Medium\nlink: https://www.petfinder.com/petdetail/41246124\n#adoptdontshop #rescue #adoptme #shelterpets\nPowered by Petfinder.com
You can’t have a pet app and not have an oversized image that unleashes all the cuteness. Gotta tug on those heart strings. Unfortunately, Twitter doesn’t display images from an image link, so we’re going to have to jump through some hoops. Twitter does permit image files to be uploaded, and one of our columns has links to pictures with varying resolutions. We pull the image into our environment from the URL and write it to a temporary file on our disk that will be fed to post_tweet
later.
# Some images are blurry on Twitter, but row 3 seems to work alright
if(is.null(pet_df$media.photos.photo[[1]])) {
# Default photo if no photo is provided
img_url <- "http://www.dogsinpictures.com/images/dog-cat-bunny-bird-love.jpg"
image_obj <- magick::image_read(img_url)
tmp <- tempfile(fileext=".jpg")
magick::image_write(image_obj, path = tmp, format = "jpg")
}else{
img_df <- pet_df %>%
select(media.photos.photo) %>%
unnest %>%
slice(3)
img_url <- img_df$`$t`[[1]]
image_obj <- magick::image_read(img_url)
tmp <- tempfile(fileext=".jpg")
magick::image_write(image_obj, path = tmp, format = "jpg")
}
Tweet and automate
We’re set to tweet!
post_tweet(message[[1]], media = tmp)
Louise is:
— Eric (@erbo_exp_bot) April 30, 2018
pet type: Dog
breed(s): Pit Bull Terrier
misc: altered
age: Adult
sex: Female
size: Medium
link: https://t.co/4h99GrVaMJ#adoptdontshop #rescue #adoptme #shelterpets
Powered by https://t.co/T0HSpQ8toe pic.twitter.com/ojgz1JwK42
To be a bot though, it needs to be automated. One way to accomplish this is to set up Windows Task Scheduler. You can create the task through the program itself. In that case, I recommend you check out the bottom of McCann’s post on RPubs. I used the taskscheduleR
package which I preferred because of the log file it creates. Task Scheduler has a history tab and a log but it doesn’t show you any useful information if an error is thrown. If you do use taskscheduleR and its add-in, make sure the date format matches your computer’s.
Once your task is initiated. I recommend setting a few options in Task Scheduler. In your task’s general tab, you want to make sure the “run with highest privileges” box is ticked. Otherwise, the UAC box pops-up everytime the task runs. Also, tick “Run whether user is logged on or not” and (if necessary) “do not store password…”. Ticking those boxes will make it so the task runs in the background and doesn’t open a window. There are some other settings worth considering, so you should do some investigating on your own. Also, make sure Rscript.exe has high enough permissions. I gave mine “full control”, but it may not need to be that high.
Lastly, you have to read-in the token and point to it explicitly in post_tweet
. Otherwise, for some reason, when the script is executed, lines get written to your .Renviron and bogus copies of your token file are created.
twitter_token <- read_rds("<home directory>twitter_token.rds")
post_tweet(message[[1]], media = tmp, token = twitter_token)
file.remove(tmp)
Session info
Session info --------------------------------------------------------------------------------------------------------
setting value
version R version 3.5.0 (2018-04-23)
system i386, mingw32
ui RStudio (1.1.423)
language (EN)
collate English_United States.1252
tz America/New_York
date 2018-05-03
Packages ------------------------------------------------------------------------------------------------------------
package * version date source
assertthat 0.2.0 2017-04-11 CRAN (R 3.5.0)
base * 3.5.0 2018-04-23 local
bindr 0.1.1 2018-03-13 CRAN (R 3.5.0)
bindrcpp * 0.2.2 2018-03-29 CRAN (R 3.5.0)
broom 0.4.4 2018-03-29 CRAN (R 3.5.0)
cellranger 1.1.0 2016-07-27 CRAN (R 3.5.0)
cli 1.0.0 2017-11-05 CRAN (R 3.5.0)
colorspace 1.3-2 2016-12-14 CRAN (R 3.5.0)
compiler 3.5.0 2018-04-23 local
crayon 1.3.4 2017-09-16 CRAN (R 3.5.0)
curl 3.2 2018-03-28 CRAN (R 3.5.0)
datasets * 3.5.0 2018-04-23 local
devtools 1.13.5 2018-02-18 CRAN (R 3.5.0)
digest 0.6.15 2018-01-28 CRAN (R 3.5.0)
dplyr * 0.7.4 2017-09-28 CRAN (R 3.5.0)
forcats * 0.3.0 2018-02-19 CRAN (R 3.5.0)
foreign 0.8-70 2017-11-28 CRAN (R 3.5.0)
ggplot2 * 2.2.1 2016-12-30 CRAN (R 3.5.0)
glue 1.2.0 2017-10-29 CRAN (R 3.5.0)
graphics * 3.5.0 2018-04-23 local
grDevices * 3.5.0 2018-04-23 local
grid 3.5.0 2018-04-23 local
gtable 0.2.0 2016-02-26 CRAN (R 3.5.0)
haven 1.1.1 2018-01-18 CRAN (R 3.5.0)
hms 0.4.2 2018-03-10 CRAN (R 3.5.0)
httr * 1.3.1 2017-08-20 CRAN (R 3.5.0)
jsonlite * 1.5 2017-06-01 CRAN (R 3.5.0)
knitr 1.20 2018-02-20 CRAN (R 3.5.0)
lattice 0.20-35 2017-03-25 CRAN (R 3.5.0)
lazyeval 0.2.1 2017-10-29 CRAN (R 3.5.0)
lettercase 0.13.1 2016-03-03 CRAN (R 3.5.0)
lubridate 1.7.4 2018-04-11 CRAN (R 3.5.0)
magick 1.8 2018-03-19 CRAN (R 3.5.0)
magrittr 1.5 2014-11-22 CRAN (R 3.5.0)
memoise 1.1.0 2017-04-21 CRAN (R 3.5.0)
methods * 3.5.0 2018-04-23 local
mime 0.5 2016-07-07 CRAN (R 3.5.0)
mnormt 1.5-5 2016-10-15 CRAN (R 3.5.0)
modelr 0.1.1 2017-07-24 CRAN (R 3.5.0)
munsell 0.4.3 2016-02-13 CRAN (R 3.5.0)
nlme 3.1-137 2018-04-07 CRAN (R 3.5.0)
openssl 1.0.1 2018-03-03 CRAN (R 3.5.0)
parallel 3.5.0 2018-04-23 local
pillar 1.2.1 2018-02-27 CRAN (R 3.5.0)
pkgconfig 2.0.1 2017-03-21 CRAN (R 3.5.0)
plyr 1.8.4 2016-06-08 CRAN (R 3.5.0)
portfolio 0.4-7 2015-01-29 CRAN (R 3.5.0)
psych 1.8.3.3 2018-03-30 CRAN (R 3.5.0)
purrr * 0.2.4 2017-10-18 CRAN (R 3.5.0)
R6 2.2.2 2017-06-17 CRAN (R 3.5.0)
Rcpp 0.12.16 2018-03-13 CRAN (R 3.5.0)
readr * 1.1.1 2017-05-16 CRAN (R 3.5.0)
readxl 1.1.0 2018-04-20 CRAN (R 3.5.0)
reshape2 1.4.3 2017-12-11 CRAN (R 3.5.0)
rlang 0.2.0 2018-02-20 CRAN (R 3.5.0)
rstudioapi 0.7 2017-09-07 CRAN (R 3.5.0)
rtweet * 0.6.0 2017-11-16 CRAN (R 3.5.0)
rvest 0.3.2 2016-06-17 CRAN (R 3.5.0)
scales 0.5.0 2017-08-24 CRAN (R 3.5.0)
stats * 3.5.0 2018-04-23 local
stringi 1.1.7 2018-03-12 CRAN (R 3.5.0)
stringr * 1.3.0 2018-02-19 CRAN (R 3.5.0)
tibble * 1.4.2 2018-01-22 CRAN (R 3.5.0)
tidyr * 0.8.0 2018-01-29 CRAN (R 3.5.0)
tidyselect 0.2.4 2018-02-26 CRAN (R 3.5.0)
tidyverse * 1.2.1 2017-11-14 CRAN (R 3.5.0)
tools 3.5.0 2018-04-23 local
utils * 3.5.0 2018-04-23 local
withr 2.1.2 2018-03-15 CRAN (R 3.5.0)
xml2 1.2.0 2018-01-24 CRAN (R 3.5.0)
yaml 2.1.18 2018-03-08 CRAN (R 3.5.0)
Next Steps
This method should work fine if you’re hosting on a local machine, but for cloud-based hosting, the parts where we write/read the image and token to/from disk may have to be changed so these things are stored in memory. With the current rtweet
package, I don’t think the solution is straight-forward. I haven’t delved into it too deeply, but one problem is the image input for post_tweet
is limited to a “path”. I did do a cursory examination of the post_tweet
function and I think it can modified without too much trouble to accept an image object that’s stored in memory.
As for the token portion of issue, I haven’t looked into it at all. The twitteR
package is scheduled for a “leisurely deprecation”, and from my memory of a few articles that used that package, I think there might be some code in there could help. It’s just a hunch though. If anyone finds a solution, I’d be very interested.
Acknowledgements
Thanks to Steve Charlesworth for taking the time to answer some questions regarding his bot.
Thanks to Matthew Gotth-Olsen for liaising with Louisville Metro Animal Services
Special Thanks to
References
[1] C. Boettiger. knitcitations: Citations for ‘Knitr’ Markdown Files. R package version 1.0.8. 2017. URL: https://CRAN.R-project.org/package=knitcitations.
[2] C. Brown. lettercase: Utilities for Formatting Strings with Consistent Capitalization, Word Breaks and White Space. R package version 0.13.1. 2016. URL: https://CRAN.R-project.org/package=lettercase.
[3] c, e. =. “jwijffels@bnosac.be";), person and person)). taskscheduleR: Schedule R Scripts and Processes with the Windows Task Scheduler. R package version 1.1. 2017. URL: https://CRAN.R-project.org/package=taskscheduleR.
[4] J. Hester. glue: Interpreted String Literals. R package version 1.2.0. 2017. URL: https://CRAN.R-project.org/package=glue.
[5] M. W. Kearney. rtweet: Collecting Twitter Data. R package version 0.6.0. 2017. URL: https://cran.r-project.org/package=rtweet.
[6] J. Ooms. magick: Advanced Graphics and Image-Processing in R. R package version 1.8. 2018. URL: https://CRAN.R-project.org/package=magick.
[7] J. Ooms. “The jsonlite Package: A Practical and Consistent Mapping Between JSON Data and R Objects”. In: arXiv:1403.2805 [stat.CO] (2014). URL: https://arxiv.org/abs/1403.2805.
[8] H. Wickham. httr: Tools for Working with URLs and HTTP. R package version 1.3.1. 2017. URL: https://CRAN.R-project.org/package=httr.
[9] H. Wickham. tidyverse: Easily Install and Load the ‘Tidyverse’. R package version 1.2.1. 2017. URL: https://CRAN.R-project.org/package=tidyverse.