{0xc4200ac840 {{{0 0} 0 0 0 0} map[]}}

Analytical Endeavors

Mar 21, 2018 - 9 minute read - Comments - sports-analysis

A Baseball Dashboard in Time for Opening Weekend (part two)

In part one, we got the player WAR values, primary positions, and determined the tenure qualification. In part two, we’ll perform the JAWS calculations and create dataframes that will be needed for our dashboard. Recall that for these calculations, you’ll need indWar, nomWar, posDat, and warDat from part one.

JAWS Calculation

To compute the JAWS values, we’ll take the average of the sum of the top four WAR values and the sum of the WAR accrued while playing for the Reds

library(tidyverse)

# total WAR during Reds tenure
warSum <- warDat %>%
      group_by(playerId) %>%
      summarize(WARtenure = sum(rWAR)) %>% 
      ungroup()

# Sum of top 4 WAR years
war4Dat <- warDat %>%
      group_by(playerId) %>%
      top_n(4, rWAR) %>%
      tally(rWAR) %>%
      rename(WAR4 = n)

# Calculating JAWS
warJaws <- warSum %>% 
      inner_join(war4Dat, by = 'playerId') %>% 
      mutate(JAWS4 = round((WARtenure + WAR4)/2, 2)) %>% 
      select(playerId, WARtenure, WAR4, JAWS4)

# Add Names and Positions to dataframe
names <- warDat %>% 
      select(playerId, Name) %>% 
      distinct()

warJaws <- warJaws %>%
      inner_join(posDat, by = 'playerId') %>% 
      inner_join(names, by = 'playerId') %>% 
      select(playerId, Name, POS, everything())


head(warJaws, 3)
## # A tibble: 3 x 6
##   playerId  Name         POS   WARtenure  WAR4 JAWS4
##   <chr>     <chr>        <chr>     <dbl> <dbl> <dbl>
## 1 becklja01 Jake Beckley 1B         23.5  16.5  20.0
## 2 bellgu01  Gus Bell     CF         13.0  12.4  12.7
## 3 benchjo01 Johnny Bench C          75.0  30.4  52.7

Weighting Positions and Averages

The number of players at each non-pitcher position differs quite a bit and will skew our averages, so we’ll add “average” Hall of Fame players to each position pool to reduce the bias. Pitchers aren’t compared to positional players statistically so there will be two sets of average calculations. They also aren’t subdivided into Starting and Relief so there will be no need to add “average” players to their pool.

# Only want inductees in our average calculation
indJaws <- warJaws %>% 
      anti_join(nomWar, by = 'playerId')

batJaws <- indJaws %>%
      select(-playerId) %>% 
      filter(POS != "P")

# 1B and CF are highest with 10 members a piece so they won't need filler players
table(batJaws$POS)
## 
## 1B 2B 3B  C CF LF RF SS 
## 10  7  2  4 10  5  5  6

First base and Center Field have the most players so we’ll add “average” players to the other position pools until the amounts are equal.

# Number of filler players needed at each position
neededPOS <- batJaws %>%
      group_by(POS) %>%
      summarize(n = n()) %>% 
      mutate(remPOS = max(n) - n) %>%
      filter(POS != "1B", POS != "CF") %>%
      select(-n)

# List of lists with filler position amounts
posLL <- map2(neededPOS$POS, neededPOS$remPOS, function(POS, n) {
      POS <- rep(POS, n)
})

# Create tibble with all the filler players for each position

# Empty tibble
posFillTib <- tibble(
      Name = character(),
      POS = character(),
      WARtenure = numeric(),
      WAR4 = numeric(),
      JAWS4 = numeric()
      
)

# input: Position; function creates one filler player with avgHOF stats
fillPOS <- function(POS) {
      posFillTib <- posFillTib %>%
            add_row(Name = "avgHOFplayer",
                    POS = POS,
                    WARtenure = median(batJaws$WARtenure),
                    WAR4 = median(batJaws$WAR4),
                    JAWS4 = median(batJaws$JAWS4)
                    
            )
}
# List of lists fed to function; outputs tibble of filler players
fillerPlayers <- map_dfr(posLL, fillPOS)

# Creating weighted distribution of position players
wtBatDistr <- batJaws %>%
      bind_rows(fillerPlayers)

head(wtBatDistr, 3)
## # A tibble: 3 x 5
##   Name         POS   WARtenure  WAR4 JAWS4
##   <chr>        <chr>     <dbl> <dbl> <dbl>
## 1 Jake Beckley 1B         23.5  16.5  20.0
## 2 Gus Bell     CF         13.0  12.4  12.7
## 3 Johnny Bench C          75.0  30.4  52.7

We can now calculate the averages using some cool, nested purrr::map action.

# Calculate weighted averages at each position
wbd_nested <- wtBatDistr %>% 
      group_by(POS) %>% 
      nest()

wt_avg_FUN <- function(df) {
      mutate(df, `Wt Avg WAR` = round(mean(WARtenure), 1),
             `Wt Avg WAR4` = round(mean(WAR4), 1),
             `Wt Avg JAWS4` = round(mean(JAWS4), 1))
}

wbd_avgs <- wbd_nested %>% 
      mutate(stats = map(data, wt_avg_FUN)) %>% 
      select(POS, stats) %>% 
      unnest() %>% 
      select(Name, POS, everything()) %>% 
      filter(Name != "avgHOFplayer")

head(wbd_avgs, 3)
## # A tibble: 3 x 8
##   Name          POS   WARtenure  WAR4 JAWS4 `Wt Avg WAR` `Wt Avg WAR4`
##   <chr>         <chr>     <dbl> <dbl> <dbl>        <dbl>         <dbl>
## 1 Jake Beckley  1B        23.5  16.5  20.0          22.3          15.4
## 2 Sean Casey    1B        16.6  13.3  14.9          22.3          15.4
## 3 Gordy Coleman 1B         7.27  7.18  7.23         22.3          15.4
## # ... with 1 more variable: `Wt Avg JAWS4` <dbl>

Create Dataframes for Visuals

DataTable

Our first visual will be a DT datatable with the WAR and JAWS calculations for each player. The positional JAWS and WAR averages we calculated above using only the inductees will be added to the nominee stat lines according to their primary position. Then pitcher averages are figured, and everything is combined into one dataframe.

# Get positional player nominees
nomBatJaws <- warJaws %>% 
      anti_join(indWar, by = 'playerId') %>% 
      filter(POS != "P") %>% 
      select(-playerId)

# Sync averages to nominee positions and combine with inductee averages dataframe
wtBatJaws <- nomBatJaws %>% 
      mutate(`Wt Avg WAR` = plyr::mapvalues(POS, from = wbd_avgs$POS,
                                            to = wbd_avgs$`Wt Avg WAR`) %>% as.numeric(),
             `Wt Avg WAR4` = plyr::mapvalues(POS, from = wbd_avgs$POS,
                                             to = wbd_avgs$`Wt Avg WAR4`) %>% as.numeric(),
             `Wt Avg JAWS4` = plyr::mapvalues(POS, from = wbd_avgs$POS,
                                              to = wbd_avgs$`Wt Avg JAWS4`) %>% as.numeric()) %>% 
      bind_rows(wbd_avgs)


# Pitcher averages
pitJaws <- warJaws %>% 
      anti_join(nomWar, by = 'playerId') %>% 
      select(-playerId) %>% 
      filter(POS == "P") %>%
      mutate(`Wt Avg WAR` = round(mean(WARtenure), 1),
             `Wt Avg WAR4` = round(mean(WAR4), 1),
             `Wt Avg JAWS4` = round(mean(JAWS4), 1))

# Get pitcher Nominees
nomPitJaws <- warJaws %>% 
      anti_join(indWar, by = 'playerId') %>% 
      filter(POS == "P") %>% 
      select(-playerId)

# Sync (pitcher pool not actually weighted)
wtPitJaws <- nomPitJaws %>% 
      mutate(`Wt Avg WAR` = plyr::mapvalues(POS, from = pitJaws$POS,
                                            to = pitJaws$`Wt Avg WAR`) %>% as.numeric(),
             `Wt Avg WAR4` = plyr::mapvalues(POS, from = pitJaws$POS,
                                             to = pitJaws$`Wt Avg WAR4`) %>% as.numeric(),
             `Wt Avg JAWS4` = plyr::mapvalues(POS, from = pitJaws$POS, to = pitJaws$`Wt Avg JAWS4`) %>% as.numeric()) %>% 
      bind_rows(pitJaws)


display_table <- wtBatJaws %>% 
      bind_rows(wtPitJaws) %>% 
      arrange(Name)

head(display_table, 3)
## # A tibble: 3 x 8
##   Name         POS   WARtenure  WAR4 JAWS4 `Wt Avg WAR` `Wt Avg WAR4`
##   <chr>        <chr>     <dbl> <dbl> <dbl>        <dbl>         <dbl>
## 1 Aaron Boone  3B         11.6  10.0  10.8         20.4          14.9
## 2 Adam Dunn    LF         16.4  11.9  14.2         29.0          16.6
## 3 Barry Larkin SS         70.2  26.2  48.2         23.5          14.2
## # ... with 1 more variable: `Wt Avg JAWS4` <dbl>

Cleveland Dot Plots

When comparing position players, position to position isn’t the only comparison that can be made. In some situations, it’s more fair to look at wider, positional group statistics. There are five groups that we’ll use: corner infielders, middle infielders, outfielders, corners, and middle. These groups along with the positional JAWS and WAR calculations will be visualized with Cleveland Dot Plots.

# Build df with group positions
cornerIF <- warJaws %>% 
      filter(POS == "1B" | POS == "3B") %>%
      mutate(POS = plyr::mapvalues(POS, from = c("1B", "3B"),
                                   to = c("CI", "CI")))

middleIF <- warJaws %>% 
      filter(POS == "2B" | POS == "SS") %>%
      mutate(POS = plyr::mapvalues(POS, from = c("2B", "SS"),
                                   to = c("MI", "MI")))

outField <- warJaws %>% 
      filter(POS == "LF" | POS == "CF" | POS == "RF") %>%
      mutate(POS = plyr::mapvalues(POS, from = c("LF", "CF", "RF"),
                                   to = c("OF", "OF", "OF")))

corners <- warJaws %>% 
      filter(POS == "1B" | POS == "3B" | POS == "LF" | POS == "RF") %>% 
      mutate(POS = plyr::mapvalues(POS, from = c("1B", "LF", "RF", "3B"),
                                   to = c("CO", "CO", "CO", "CO")))

middle <- warJaws %>% 
      filter(POS == "2B" | POS == "SS" | POS == "C" | POS == "CF") %>% 
      mutate(POS = plyr::mapvalues(POS, from = c("2B", "SS", "C", "CF"),
                                   to = c("Md", "Md", "Md", "Md")))

other_groups <- cornerIF %>% 
      bind_rows(middleIF, outField, corners, middle)


# Calculate averages of each group

other_groups_i <- other_groups %>% 
      anti_join(nomWar, by = 'playerId')

og_nested <- other_groups_i %>% 
      group_by(POS) %>% 
      nest()

avg_FUN <- function(df) {
      mutate(df, WAR_avg = round(mean(WARtenure), 1),
             WAR4_avg = round(mean(WAR4), 1),
             JAWS_avg = round(mean(JAWS4), 1))
}

group_avgs_i <- og_nested %>% 
      mutate(stats = map(data, avg_FUN)) %>% 
      select(POS, stats) %>% 
      unnest() %>% 
      select(playerId, Name, POS, everything())

# Add Nominees

other_groups_n <- other_groups %>% 
      anti_join(indWar, by = 'playerId')

group_avgs <- other_groups_n %>% 
      mutate(WAR_avg = plyr::mapvalues(POS, from = group_avgs_i$POS,
                                       to = group_avgs_i$WAR_avg) %>% as.numeric(),
             WAR4_avg = plyr::mapvalues(POS, from = group_avgs_i$POS,
                                        to = group_avgs_i$WAR4_avg) %>% as.numeric(),
             JAWS_avg = plyr::mapvalues(POS, from = group_avgs_i$POS,
                                        to = group_avgs_i$JAWS_avg) %>% as.numeric()) %>% 
      bind_rows(group_avgs_i)

# Prepare dataframe for JAWS dot chart
dot_table <- display_table %>% 
      rename(JAWS_avg = `Wt Avg JAWS4`, WAR_avg = `Wt Avg WAR`) %>% 
      bind_rows(group_avgs)

jaws_group <- dot_table %>% 
      select(Name, POS, JAWS4, JAWS_avg) %>% 
      rename(Group = POS, `Avg HOF` = JAWS_avg) %>% 
      gather(key = "Stat", value = "Value", -c(Name, Group))

# Prepare dataframe for WAR dot chart
war_group <- dot_table %>% 
      select(Name, POS, WARtenure, WAR_avg) %>% 
      rename(Group = POS, `Avg HOF` = WAR_avg, WAR = WARtenure) %>% 
      gather(key = "Stat", value = "Value", -c(Name, Group))

glimpse(war_group)
## Observations: 368
## Variables: 4
## $ Name  <chr> "Aaron Boone", "Adam Dunn", "Barry Larkin", "Bid McPhee"...
## $ Group <chr> "3B", "LF", "SS", "2B", "SS", "P", "P", "P", "C", "P", "...
## $ Stat  <chr> "WAR", "WAR", "WAR", "WAR", "WAR", "WAR", "WAR", "WAR", ...
## $ Value <dbl> 11.61, 16.44, 70.17, 52.39, 13.55, 29.07, 22.26, 7.43, 1...

Interactive Line Chart

The final visual will be a line graph of player WAR values for each season played with the Reds. We’ll add some emphasis to the four largest WAR values and a horizontal line to indicate a typical Hall of Famer.

# WAR4 + years; add type column
war4Dat <- warDat %>%
      group_by(playerId) %>%
      top_n(4, rWAR) %>% 
      ungroup() %>% 
      select(-teamId) %>% 
      add_column(type = rep("WAR4", 328))

# Not WAR4 + years; add type column
notWar4 <- warDat %>% 
      anti_join(war4Dat, by = c("playerId", "yearId")) %>% 
      select(-teamId) %>% 
      add_column(type = rep("WAR", 427))

war_combined <- notWar4 %>% 
      bind_rows(war4Dat)


#  Positional and Pitcher seasonal average WAR values

pitMedWar <- war_combined %>% 
      filter(POS == "P") %>% 
      summarize(`Median Pitcher WAR` = median(rWAR))

posMedWAR <- war_combined %>% 
      filter(POS != "P") %>% 
      summarize(`Median Position WAR` = median(rWAR))

war_combo_avg <- war_combined %>% 
      mutate(`Median WAR` = if_else(POS == "P",
                                    pitMedWar$`Median Pitcher WAR`[[1]],
                                    posMedWAR$`Median Position WAR`[[1]])) %>% 
      rename(bbref_id = playerId, WAR = rWAR) %>% 
      select(bbref_id, Name, everything())

Save Objects

Only four objects will be required for the final post in this series: display_table, jaws_group, war_group, and war_combo_avg.

Conclusion

In this post, we calculated the JAWS statistic and positional averages that can be used to evaluate nominees and compare members of the Reds Hall of Fame. We also created positional group averages which could come in handy in certain situations such as with players that were more versatile and played multiple positions throughout their career. Lastly, we produced the data sets that will be used in our visuals for part three.

References

[1] C. Boettiger. knitcitations: Citations for ‘Knitr’ Markdown Files. R package version 1.0.8. 2017. URL: https://CRAN.R-project.org/package=knitcitations.

[2] H. Wickham. “The Split-Apply-Combine Strategy for Data Analysis”. In: Journal of Statistical Software 40.1 (2011), pp. 1–29. URL: http://www.jstatsoft.org/v40/i01/.

[3] H. Wickham. tidyverse: Easily Install and Load the ‘Tidyverse’. R package version 1.2.1. 2017. URL: https://CRAN.R-project.org/package=tidyverse.