MLB Superstar 2024 Pitch-By-Pitch Breakdown

Summary statistics
Distributions
Author

Cooper Olney

Published

June 26, 2025

Motivation

Baseball of often described as a game of matchups, a sport where each pitch creates a mini battle between pitcher and batter. In Major League Baseball (MLB), every pitch of an at-bat, whether it results in a swing, miss or contact is recorded in detail. This level of tracking allows us to break down every plate appearance using metrics that capture both the mechanics and the outcomes of each moment at the plate.

In the 2024 season, few players drew more attention than Aaron Judge, Shohei Ohtani and Mookie Betts, three of the MLB’s premier stars known for their elite hitting skills. These players serve as ideal case studies for examining how top-tier hitters navigate different pitch types, counts and in-game situations throughout a season. By analyzing pitch-by-pitch data for these superstars, we can shed light on what sets them apart at the plate, and what patterns or tendencies emerge over time.

Data

Below are three datasets containing pitch-by-pitch information from the 2024 at-bats of Aaron Judge (2885 rows), Shohei Ohtani (2808 rows), and Mookie Betts (2000 rows). Each dataset includes 15 variables that contains details about the pitch thrown and the result of pitch and contextual information about the game at that moment.

Data courtesy of baseballsavant.mlb.com and accessed using the baseballr package.

Variable Descriptions
Variable Description
game_date The date the game took place.
inning The inning during which the pitch occurred.
outs_when_up The number of outs when the batter stepped up to the plate.
pitch_number The number of the pitch with the plate appearance.
pitch_name The type of classification of the pitch thrown (e.g., 4-seam Fastball, Slider).
p_throws The pitcher’s throwing hand (R for right, L for left).
description Text description of the pitch outcome (e.g., swinging_strike, ball, foul).
events The result of the plate appearance (e.g., single, field_out, walk).
launch_speed The speed (in mph) of the ball off the bat when contact is made.
launch_angle The vertical angle (in degrees) at which the ball leaves the bat.
hit_distance_sc The distance (in feet) the ball traveled as measured or estimated by Statcast.
launch_speed_angle Classification bin (1-6) based on the quality of contact, combing launch speed and launch angle to estimate how well the ball was hit. Higher values indicate better contact and a greater likelihood of a positive outcome.
zone The strike zone location of the pitch from catchers viewpoint. 1-9 are within the strike zone, 11-14 are just outside of the strike zone. 10 is way outside of strike zone. Image of Zones
release_speed The speed (in mph) of the pitch as it leaves the pitcher’s hand.
release_spin_rate The spin rate (in RPM) of the ball as it leaves the pitcher’s hand.

Questions

  1. Are some pitch types more likely to results in home runs than others for each hitter? Investigate whether the outcome of an at-bat (from the events variable) differs by pitch type. Do certain hitters have more success on fastballs or off-speed pitches?

  2. How does launch_speed differ across pitch zones? Explore whether certain areas of the strike zone tend to produce harder contact. Do the three hitters have different “sweet spots” where they generate higher exit velocities?

References

Petti B, Gilani S (2022). baseballr: Acquiring and Analyzing Baseball Data. R package version 1.3.0, https://CRAN.R-project.org/package=baseballr.

Code



# Install baseballr if needed
# if (!requireNamespace('devtools', quietly = TRUE)){
#  install.packages('devtools')
# }
# devtools::install_github(repo = "BillPetti/baseballr")


library(baseballr)
library(tidyverse)

### Aaron Judge

# Check for playerid
playerid_lookup(last_name = "Judge",
                first_name = "Aaron")

judge <-
  scrape_statcast_savant(
  start_date = "2024-3-28",
  end_date = "2024-9-29",
  playerid = 592450,
  player_type = "batter"
) %>%
  select(
    game_date, inning, outs_when_up, pitch_number, pitch_name, 
    p_throws, description, events, launch_speed, launch_angle, 
    hit_distance_sc, launch_speed_angle, zone, 
    release_speed,release_spin_rate
    ) %>%
  mutate(events = if_else(events == "", NA, events)) %>%
  filter(!is.na(events))

### Shohei Ohtani

# check for playerid
playerid_lookup(last_name = "Ohtani",
                first_name = "Shohei")

ohtani <-
  scrape_statcast_savant(
    start_date = "2024-3-28",
    end_date = "2024-9-29",
    playerid = 660271,
    player_type = "batter"
  ) %>%
  select(
    game_date, inning, outs_when_up, pitch_number, pitch_name,
    p_throws, description, events, launch_speed, launch_angle,
    hit_distance_sc, launch_speed_angle, zone,
    release_speed,release_spin_rate
    )

### Mookie Betts

# check for playerid
playerid_lookup(last_name = "Betts",
                first_name = "Mookie")
betts <-
  scrape_statcast_savant(
    start_date = "2024-3-28",
    end_date = "2024-9-29",
    playerid = 605141,
    player_type = "batter"
  ) %>%
  select(
    game_date, inning, outs_when_up, pitch_number, pitch_name,
    p_throws, description, events, launch_speed, launch_angle,
    hit_distance_sc, launch_speed_angle, zone,
    release_speed,release_spin_rate
    )