MLB Superstar 2024 Pitch-By-Pitch Breakdown
Motivation
Baseball of often described as a game of matchups, a sport where each pitch creates a mini battle between pitcher and batter. In Major League Baseball (MLB), every pitch of an at-bat, whether it results in a swing, miss or contact is recorded in detail. This level of tracking allows us to break down every plate appearance using metrics that capture both the mechanics and the outcomes of each moment at the plate.
In the 2024 season, few players drew more attention than Aaron Judge, Shohei Ohtani and Mookie Betts, three of the MLB’s premier stars known for their elite hitting skills. These players serve as ideal case studies for examining how top-tier hitters navigate different pitch types, counts and in-game situations throughout a season. By analyzing pitch-by-pitch data for these superstars, we can shed light on what sets them apart at the plate, and what patterns or tendencies emerge over time.
Data
Below are three datasets containing pitch-by-pitch information from the 2024 at-bats of Aaron Judge (2885 rows), Shohei Ohtani (2808 rows), and Mookie Betts (2000 rows). Each dataset includes 15 variables that contains details about the pitch thrown and the result of pitch and contextual information about the game at that moment.
Data courtesy of baseballsavant.mlb.com and accessed using the baseballr package.
Variable Descriptions
Variable | Description |
---|---|
game_date | The date the game took place. |
inning | The inning during which the pitch occurred. |
outs_when_up | The number of outs when the batter stepped up to the plate. |
pitch_number | The number of the pitch with the plate appearance. |
pitch_name | The type of classification of the pitch thrown (e.g., 4-seam Fastball, Slider). |
p_throws | The pitcher’s throwing hand (R for right, L for left). |
description | Text description of the pitch outcome (e.g., swinging_strike, ball, foul). |
events | The result of the plate appearance (e.g., single, field_out, walk). |
launch_speed | The speed (in mph) of the ball off the bat when contact is made. |
launch_angle | The vertical angle (in degrees) at which the ball leaves the bat. |
hit_distance_sc | The distance (in feet) the ball traveled as measured or estimated by Statcast. |
launch_speed_angle | Classification bin (1-6) based on the quality of contact, combing launch speed and launch angle to estimate how well the ball was hit. Higher values indicate better contact and a greater likelihood of a positive outcome. |
zone | The strike zone location of the pitch from catchers viewpoint. 1-9 are within the strike zone, 11-14 are just outside of the strike zone. 10 is way outside of strike zone. Image of Zones |
release_speed | The speed (in mph) of the pitch as it leaves the pitcher’s hand. |
release_spin_rate | The spin rate (in RPM) of the ball as it leaves the pitcher’s hand. |
Questions
Are some pitch types more likely to results in home runs than others for each hitter? Investigate whether the outcome of an at-bat (from the
events
variable) differs by pitch type. Do certain hitters have more success on fastballs or off-speed pitches?How does
launch_speed
differ across pitch zones? Explore whether certain areas of the strike zone tend to produce harder contact. Do the three hitters have different “sweet spots” where they generate higher exit velocities?
References
Petti B, Gilani S (2022). baseballr: Acquiring and Analyzing Baseball Data. R package version 1.3.0, https://CRAN.R-project.org/package=baseballr.
Code
# Install baseballr if needed
# if (!requireNamespace('devtools', quietly = TRUE)){
# install.packages('devtools')
# }
# devtools::install_github(repo = "BillPetti/baseballr")
library(baseballr)
library(tidyverse)
### Aaron Judge
# Check for playerid
playerid_lookup(last_name = "Judge",
first_name = "Aaron")
judge <-
scrape_statcast_savant(
start_date = "2024-3-28",
end_date = "2024-9-29",
playerid = 592450,
player_type = "batter"
) %>%
select(
game_date, inning, outs_when_up, pitch_number, pitch_name,
p_throws, description, events, launch_speed, launch_angle,
hit_distance_sc, launch_speed_angle, zone,
release_speed,release_spin_rate
) %>%
mutate(events = if_else(events == "", NA, events)) %>%
filter(!is.na(events))
### Shohei Ohtani
# check for playerid
playerid_lookup(last_name = "Ohtani",
first_name = "Shohei")
ohtani <-
scrape_statcast_savant(
start_date = "2024-3-28",
end_date = "2024-9-29",
playerid = 660271,
player_type = "batter"
) %>%
select(
game_date, inning, outs_when_up, pitch_number, pitch_name,
p_throws, description, events, launch_speed, launch_angle,
hit_distance_sc, launch_speed_angle, zone,
release_speed,release_spin_rate
)
### Mookie Betts
# check for playerid
playerid_lookup(last_name = "Betts",
first_name = "Mookie")
betts <-
scrape_statcast_savant(
start_date = "2024-3-28",
end_date = "2024-9-29",
playerid = 605141,
player_type = "batter"
) %>%
select(
game_date, inning, outs_when_up, pitch_number, pitch_name,
p_throws, description, events, launch_speed, launch_angle,
hit_distance_sc, launch_speed_angle, zone,
release_speed,release_spin_rate
)