10 MLB Pitchers 2015-2019

Histograms
Summary statistics
Proportions
Author

Phoebe Jones

Published

June 26, 2025

Motivation

Baseball is an extremely data-rich sport. Because of the less-continuous nature of the game, data is able to be collected at close to every point during the game, making it an extremely useful sport for analysis.

Baseball Background

In baseball, the game is split into innings. There are usually nine innings, each of which is split into a top and bottom. In the top of the inning, the home team is pitching (throwing the baseball) and the away team is batting (attempting to hit the thrown baseball with their bat); they switch for the bottom of the inning. Teams can score points during their turn batting. The goal is to get players around the field (around 3 bases) and back to home base (where they bat from). At the most basic level, players are separated into two categories: pitchers and batters. This data set only deals with pitchers. These players are responsible for throwing the pitches that the batter must hit to advance around the field.

Pitchers

The pitchers included in this data set are 10 standout pitchers from the last decade: Clayton Kershaw, Justin Verlander, Max Scherzer, Chris Sale, Madison Bumgarner, Cole Hamels, Zack Greinke, David Price, Jacob deGrom, and Jon Lester. To avoid problems resulting from the shortened 2020 season (due to Covid) as well as any seasons that certain pitchers missed due to injury (for example, Justin Verlander in 2021), this data set is from the 2015-2019 seasons. For these years, the pitchers each threw over 2000 pitches/season, providing ample data.

Data

The pitchers data set contains 33,289 rows and 7 columns.

Each of row in the pitchers data set represents a pitch thrown by one of the pitchers in their 2015-2019 seasons. The pitch type in this data set is distinguished by abbreviation. The most common types are four-seam fastballs (FF), sinkers (SI), changeups (CH), sliders (SL), and curveballs (CU). Most of the rest of the categorizations can be found here. The result column is also in shorthand, with B meaning ball, S meaning strike, and X meaning in-play.

Variable Descriptions
Variable Description
player_name Pitcher name
pitch_type Type of pitch in shorthand
result Shorthand for result of pitch
release_speed Out-of-hand pitch velocity
description Short description of pitch outcome
count Strike-ball count at time of pitch
outs_when_up The out count at time of pitch

Download Pitching Data: pitchers.csv

Questions

  1. Are some pitches more likely to throw certain types of pitches?
  2. Which player throws the fastest, on average?
  3. Does the out count have an impact on the type of pitch thrown?
  4. Can you predict how fast a pitch will be based on the pitcher, if you account for pitch type?

References

Petti B, Gilani S (2025). baseballr: Acquiring and Analyzing Baseball Data. R package version 1.6.0, commit af84f6deaf5115490791936abcbf11f3586b4597, https://github.com/mlascaleia/baseballr.

Code


# Install baseballr if needed
# if (!requireNamespace('devtools', quietly = TRUE)){
#  install.packages('devtools')
# }
# devtools::install_github(repo = "BillPetti/baseballr")

library(tidyverse)
library(baseballr)

# download each pitcher's data
# See https://billpetti.github.io/baseballr/articles/using_statcast_pitch_data.html#find-corbin-burnes-mlbam-id
# for an example on how to get a specific player's id
CK <- statcast_search_pitchers(start_date = "2015-04-05", end_date = "2019-10-30", pitcherid = 477132)
JV <- statcast_search_pitchers(start_date = "2015-04-05", end_date = "2019-10-30", pitcherid = 434378)
MS <- statcast_search_pitchers(start_date = "2015-04-05", end_date = "2019-10-30", pitcherid = 453286)
CS <- statcast_search_pitchers(start_date = "2015-04-05", end_date = "2019-10-30", pitcherid = 519242)
MB <- statcast_search_pitchers(start_date = "2015-04-05", end_date = "2019-10-30", pitcherid = 518516)
CH <- statcast_search_pitchers(start_date = "2015-04-05", end_date = "2019-10-30", pitcherid = 430935)
ZG <- statcast_search_pitchers(start_date = "2015-04-05", end_date = "2019-10-30", pitcherid = 425844)
DP <- statcast_search_pitchers(start_date = "2015-04-05", end_date = "2019-10-30", pitcherid = 456034)
JD <- statcast_search_pitchers(start_date = "2015-04-05", end_date = "2019-10-30", pitcherid = 594798)
JL <- statcast_search_pitchers(start_date = "2015-04-05", end_date = "2019-10-30", pitcherid = 452657)

pitchers <- 
  bind_rows(CK, JV, MS, CS, MB, CH, ZG, DP, JD, JL) |> 
  select(player_name, pitch_type, type, 
         release_speed, events, description, 
         balls, strikes, outs_when_up) |> 
  unite(count, balls, strikes, sep = "-") |>
  select(!events) |> 
  rename(result = type)