@ctrent I’ve always thought that it’d be fun to debate the @Reds HOF like the real one. Set a high , but debatable, bar. Bring on the JAWS
— Craig Wales (@C_Dubs1) August 22, 2017
it would be interesting, I don't have the math or computer skills to do so https://t.co/p5n9O5NjGT
— C. Trent Rosecrans (@ctrent) August 22, 2017
In part one, we got the player WAR values, primary positions, and determined the tenure qualification. In part two, we’ll perform the JAWS calculations and create dataframes that will be needed for our dashboard. Recall that for these calculations, you’ll need indWar
, nomWar
, posDat
, and warDat
from part one.
JAWS Calculation
To compute the JAWS values, we’ll take the average of the sum of the top four WAR values and the sum of the WAR accrued while playing for the Reds
library(tidyverse)
# total WAR during Reds tenure
warSum <- warDat %>%
group_by(playerId) %>%
summarize(WARtenure = sum(rWAR)) %>%
ungroup()
# Sum of top 4 WAR years
war4Dat <- warDat %>%
group_by(playerId) %>%
top_n(4, rWAR) %>%
tally(rWAR) %>%
rename(WAR4 = n)
# Calculating JAWS
warJaws <- warSum %>%
inner_join(war4Dat, by = 'playerId') %>%
mutate(JAWS4 = round((WARtenure + WAR4)/2, 2)) %>%
select(playerId, WARtenure, WAR4, JAWS4)
# Add Names and Positions to dataframe
names <- warDat %>%
select(playerId, Name) %>%
distinct()
warJaws <- warJaws %>%
inner_join(posDat, by = 'playerId') %>%
inner_join(names, by = 'playerId') %>%
select(playerId, Name, POS, everything())
head(warJaws, 3)
## # A tibble: 3 x 6
## playerId Name POS WARtenure WAR4 JAWS4
## <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 becklja01 Jake Beckley 1B 23.5 16.5 20.0
## 2 bellgu01 Gus Bell CF 13.0 12.4 12.7
## 3 benchjo01 Johnny Bench C 75.0 30.4 52.7
Weighting Positions and Averages
The number of players at each non-pitcher position differs quite a bit and will skew our averages, so we’ll add “average” Hall of Fame players to each position pool to reduce the bias. Pitchers aren’t compared to positional players statistically so there will be two sets of average calculations. They also aren’t subdivided into Starting and Relief so there will be no need to add “average” players to their pool.
# Only want inductees in our average calculation
indJaws <- warJaws %>%
anti_join(nomWar, by = 'playerId')
batJaws <- indJaws %>%
select(-playerId) %>%
filter(POS != "P")
# 1B and CF are highest with 10 members a piece so they won't need filler players
table(batJaws$POS)
##
## 1B 2B 3B C CF LF RF SS
## 10 7 2 4 10 5 5 6
First base and Center Field have the most players so we’ll add “average” players to the other position pools until the amounts are equal.
# Number of filler players needed at each position
neededPOS <- batJaws %>%
group_by(POS) %>%
summarize(n = n()) %>%
mutate(remPOS = max(n) - n) %>%
filter(POS != "1B", POS != "CF") %>%
select(-n)
# List of lists with filler position amounts
posLL <- map2(neededPOS$POS, neededPOS$remPOS, function(POS, n) {
POS <- rep(POS, n)
})
# Create tibble with all the filler players for each position
# Empty tibble
posFillTib <- tibble(
Name = character(),
POS = character(),
WARtenure = numeric(),
WAR4 = numeric(),
JAWS4 = numeric()
)
# input: Position; function creates one filler player with avgHOF stats
fillPOS <- function(POS) {
posFillTib <- posFillTib %>%
add_row(Name = "avgHOFplayer",
POS = POS,
WARtenure = median(batJaws$WARtenure),
WAR4 = median(batJaws$WAR4),
JAWS4 = median(batJaws$JAWS4)
)
}
# List of lists fed to function; outputs tibble of filler players
fillerPlayers <- map_dfr(posLL, fillPOS)
# Creating weighted distribution of position players
wtBatDistr <- batJaws %>%
bind_rows(fillerPlayers)
head(wtBatDistr, 3)
## # A tibble: 3 x 5
## Name POS WARtenure WAR4 JAWS4
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 Jake Beckley 1B 23.5 16.5 20.0
## 2 Gus Bell CF 13.0 12.4 12.7
## 3 Johnny Bench C 75.0 30.4 52.7
We can now calculate the averages using some cool, nested purrr::map
action.
# Calculate weighted averages at each position
wbd_nested <- wtBatDistr %>%
group_by(POS) %>%
nest()
wt_avg_FUN <- function(df) {
mutate(df, `Wt Avg WAR` = round(mean(WARtenure), 1),
`Wt Avg WAR4` = round(mean(WAR4), 1),
`Wt Avg JAWS4` = round(mean(JAWS4), 1))
}
wbd_avgs <- wbd_nested %>%
mutate(stats = map(data, wt_avg_FUN)) %>%
select(POS, stats) %>%
unnest() %>%
select(Name, POS, everything()) %>%
filter(Name != "avgHOFplayer")
head(wbd_avgs, 3)
## # A tibble: 3 x 8
## Name POS WARtenure WAR4 JAWS4 `Wt Avg WAR` `Wt Avg WAR4`
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Jake Beckley 1B 23.5 16.5 20.0 22.3 15.4
## 2 Sean Casey 1B 16.6 13.3 14.9 22.3 15.4
## 3 Gordy Coleman 1B 7.27 7.18 7.23 22.3 15.4
## # ... with 1 more variable: `Wt Avg JAWS4` <dbl>
Create Dataframes for Visuals
DataTable
Our first visual will be a DT
datatable with the WAR and JAWS calculations for each player. The positional JAWS and WAR averages we calculated above using only the inductees will be added to the nominee stat lines according to their primary position. Then pitcher averages are figured, and everything is combined into one dataframe.
# Get positional player nominees
nomBatJaws <- warJaws %>%
anti_join(indWar, by = 'playerId') %>%
filter(POS != "P") %>%
select(-playerId)
# Sync averages to nominee positions and combine with inductee averages dataframe
wtBatJaws <- nomBatJaws %>%
mutate(`Wt Avg WAR` = plyr::mapvalues(POS, from = wbd_avgs$POS,
to = wbd_avgs$`Wt Avg WAR`) %>% as.numeric(),
`Wt Avg WAR4` = plyr::mapvalues(POS, from = wbd_avgs$POS,
to = wbd_avgs$`Wt Avg WAR4`) %>% as.numeric(),
`Wt Avg JAWS4` = plyr::mapvalues(POS, from = wbd_avgs$POS,
to = wbd_avgs$`Wt Avg JAWS4`) %>% as.numeric()) %>%
bind_rows(wbd_avgs)
# Pitcher averages
pitJaws <- warJaws %>%
anti_join(nomWar, by = 'playerId') %>%
select(-playerId) %>%
filter(POS == "P") %>%
mutate(`Wt Avg WAR` = round(mean(WARtenure), 1),
`Wt Avg WAR4` = round(mean(WAR4), 1),
`Wt Avg JAWS4` = round(mean(JAWS4), 1))
# Get pitcher Nominees
nomPitJaws <- warJaws %>%
anti_join(indWar, by = 'playerId') %>%
filter(POS == "P") %>%
select(-playerId)
# Sync (pitcher pool not actually weighted)
wtPitJaws <- nomPitJaws %>%
mutate(`Wt Avg WAR` = plyr::mapvalues(POS, from = pitJaws$POS,
to = pitJaws$`Wt Avg WAR`) %>% as.numeric(),
`Wt Avg WAR4` = plyr::mapvalues(POS, from = pitJaws$POS,
to = pitJaws$`Wt Avg WAR4`) %>% as.numeric(),
`Wt Avg JAWS4` = plyr::mapvalues(POS, from = pitJaws$POS, to = pitJaws$`Wt Avg JAWS4`) %>% as.numeric()) %>%
bind_rows(pitJaws)
display_table <- wtBatJaws %>%
bind_rows(wtPitJaws) %>%
arrange(Name)
head(display_table, 3)
## # A tibble: 3 x 8
## Name POS WARtenure WAR4 JAWS4 `Wt Avg WAR` `Wt Avg WAR4`
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Aaron Boone 3B 11.6 10.0 10.8 20.4 14.9
## 2 Adam Dunn LF 16.4 11.9 14.2 29.0 16.6
## 3 Barry Larkin SS 70.2 26.2 48.2 23.5 14.2
## # ... with 1 more variable: `Wt Avg JAWS4` <dbl>
Cleveland Dot Plots
When comparing position players, position to position isn’t the only comparison that can be made. In some situations, it’s more fair to look at wider, positional group statistics. There are five groups that we’ll use: corner infielders, middle infielders, outfielders, corners, and middle. These groups along with the positional JAWS and WAR calculations will be visualized with Cleveland Dot Plots.
# Build df with group positions
cornerIF <- warJaws %>%
filter(POS == "1B" | POS == "3B") %>%
mutate(POS = plyr::mapvalues(POS, from = c("1B", "3B"),
to = c("CI", "CI")))
middleIF <- warJaws %>%
filter(POS == "2B" | POS == "SS") %>%
mutate(POS = plyr::mapvalues(POS, from = c("2B", "SS"),
to = c("MI", "MI")))
outField <- warJaws %>%
filter(POS == "LF" | POS == "CF" | POS == "RF") %>%
mutate(POS = plyr::mapvalues(POS, from = c("LF", "CF", "RF"),
to = c("OF", "OF", "OF")))
corners <- warJaws %>%
filter(POS == "1B" | POS == "3B" | POS == "LF" | POS == "RF") %>%
mutate(POS = plyr::mapvalues(POS, from = c("1B", "LF", "RF", "3B"),
to = c("CO", "CO", "CO", "CO")))
middle <- warJaws %>%
filter(POS == "2B" | POS == "SS" | POS == "C" | POS == "CF") %>%
mutate(POS = plyr::mapvalues(POS, from = c("2B", "SS", "C", "CF"),
to = c("Md", "Md", "Md", "Md")))
other_groups <- cornerIF %>%
bind_rows(middleIF, outField, corners, middle)
# Calculate averages of each group
other_groups_i <- other_groups %>%
anti_join(nomWar, by = 'playerId')
og_nested <- other_groups_i %>%
group_by(POS) %>%
nest()
avg_FUN <- function(df) {
mutate(df, WAR_avg = round(mean(WARtenure), 1),
WAR4_avg = round(mean(WAR4), 1),
JAWS_avg = round(mean(JAWS4), 1))
}
group_avgs_i <- og_nested %>%
mutate(stats = map(data, avg_FUN)) %>%
select(POS, stats) %>%
unnest() %>%
select(playerId, Name, POS, everything())
# Add Nominees
other_groups_n <- other_groups %>%
anti_join(indWar, by = 'playerId')
group_avgs <- other_groups_n %>%
mutate(WAR_avg = plyr::mapvalues(POS, from = group_avgs_i$POS,
to = group_avgs_i$WAR_avg) %>% as.numeric(),
WAR4_avg = plyr::mapvalues(POS, from = group_avgs_i$POS,
to = group_avgs_i$WAR4_avg) %>% as.numeric(),
JAWS_avg = plyr::mapvalues(POS, from = group_avgs_i$POS,
to = group_avgs_i$JAWS_avg) %>% as.numeric()) %>%
bind_rows(group_avgs_i)
# Prepare dataframe for JAWS dot chart
dot_table <- display_table %>%
rename(JAWS_avg = `Wt Avg JAWS4`, WAR_avg = `Wt Avg WAR`) %>%
bind_rows(group_avgs)
jaws_group <- dot_table %>%
select(Name, POS, JAWS4, JAWS_avg) %>%
rename(Group = POS, `Avg HOF` = JAWS_avg) %>%
gather(key = "Stat", value = "Value", -c(Name, Group))
# Prepare dataframe for WAR dot chart
war_group <- dot_table %>%
select(Name, POS, WARtenure, WAR_avg) %>%
rename(Group = POS, `Avg HOF` = WAR_avg, WAR = WARtenure) %>%
gather(key = "Stat", value = "Value", -c(Name, Group))
glimpse(war_group)
## Observations: 368
## Variables: 4
## $ Name <chr> "Aaron Boone", "Adam Dunn", "Barry Larkin", "Bid McPhee"...
## $ Group <chr> "3B", "LF", "SS", "2B", "SS", "P", "P", "P", "C", "P", "...
## $ Stat <chr> "WAR", "WAR", "WAR", "WAR", "WAR", "WAR", "WAR", "WAR", ...
## $ Value <dbl> 11.61, 16.44, 70.17, 52.39, 13.55, 29.07, 22.26, 7.43, 1...
Interactive Line Chart
The final visual will be a line graph of player WAR values for each season played with the Reds. We’ll add some emphasis to the four largest WAR values and a horizontal line to indicate a typical Hall of Famer.
# WAR4 + years; add type column
war4Dat <- warDat %>%
group_by(playerId) %>%
top_n(4, rWAR) %>%
ungroup() %>%
select(-teamId) %>%
add_column(type = rep("WAR4", 328))
# Not WAR4 + years; add type column
notWar4 <- warDat %>%
anti_join(war4Dat, by = c("playerId", "yearId")) %>%
select(-teamId) %>%
add_column(type = rep("WAR", 427))
war_combined <- notWar4 %>%
bind_rows(war4Dat)
# Positional and Pitcher seasonal average WAR values
pitMedWar <- war_combined %>%
filter(POS == "P") %>%
summarize(`Median Pitcher WAR` = median(rWAR))
posMedWAR <- war_combined %>%
filter(POS != "P") %>%
summarize(`Median Position WAR` = median(rWAR))
war_combo_avg <- war_combined %>%
mutate(`Median WAR` = if_else(POS == "P",
pitMedWar$`Median Pitcher WAR`[[1]],
posMedWAR$`Median Position WAR`[[1]])) %>%
rename(bbref_id = playerId, WAR = rWAR) %>%
select(bbref_id, Name, everything())
Save Objects
Only four objects will be required for the final post in this series: display_table
, jaws_group
, war_group
, and war_combo_avg
.
Conclusion
In this post, we calculated the JAWS statistic and positional averages that can be used to evaluate nominees and compare members of the Reds Hall of Fame. We also created positional group averages which could come in handy in certain situations such as with players that were more versatile and played multiple positions throughout their career. Lastly, we produced the data sets that will be used in our visuals for part three.
References
[1] C. Boettiger. knitcitations: Citations for ‘Knitr’ Markdown Files. R package version 1.0.8. 2017. URL: https://CRAN.R-project.org/package=knitcitations.
[2] H. Wickham. “The Split-Apply-Combine Strategy for Data Analysis”. In: Journal of Statistical Software 40.1 (2011), pp. 1–29. URL: http://www.jstatsoft.org/v40/i01/.
[3] H. Wickham. tidyverse: Easily Install and Load the ‘Tidyverse’. R package version 1.2.1. 2017. URL: https://CRAN.R-project.org/package=tidyverse.