National Hockey League Shots

classification
logistic regression
generalized additive models
multinomial logistic regression
naive bayes classifier
density estimation
Information about shots during the 2021-2022 NHL season
Author

Ron Yurko

Published

March 23, 2023

Motivation

The National Hockey League (NHL) is the top professional men’s hockey league in the world. The league records every shot players take along with contextual information about the shot such as its location, the player’s distance and angle to the goal when attempting the shot, as well as the outcome (blocked, missed, or goal). Using this information, the hockey analytics community have developed measures of shot quality known as expected goals. With this dataset, you can create your own expected goals model to predict the shot outcome given relevant features.

Data

This dataset contains information about 160,573 shots during the 2021-2022 NHL season.

The data was collected using the hockeyR package in R.

nhl-shots-2021.csv.gz
Variable Description
game_id Unique integer identifier for game shot took place in
description String detailed description of shot event
shot_outcome String denoting the outcome of the shot, either BLOCKED_SHOT (meaning blocked by a non-goalie), GOAL, MISSED_SHOT (shot that missed the net), or SHOT (shot on net that was saved by a goalie)
period Integer value of the game period
period_seconds_remaining Numeric value of the seconds remaining in the period
game_seconds_remaining Numeric value of the seconds remaining in the game; negative for overtime periods
home_score Integer value of the home team score after the event
away_score Integer value of the away team score after the event
home_name String name of the home team
away_name String name of the away team
event_team String defining the team taking the shot
event_goalie_name String name of goalie (if in net)
empty_net Boolean indicating if the shot was during an empty net situation, TRUE if so but FALSE or NA if not
event_player_1_name String name of the primary event player
event_player_1_type String indicator for the role of event_player_1 (typically the shooter)
event_player_2_name String name of the secondary event player
event_player_2_type String indicator for the role of event_player_2 (blocker, assist, or goalie)
strength_code String indicator for game strength: EV (Even), SH (Shorthanded), or PP (Power Play)
x_fixed Numeric transformed x-coordinate of event in feet, where the home team always shoots to the right, away team to the left
y_fixed Numeric transformed y-coordinate of event in feet, where the home team always shoots to the right, away team to the left
shot_distance Numeric distance (in feet) to center of net for unblocked shot events
shot_angle Numeric angle (in degrees) to center of net for unlocked shot events

Questions

  1. Build a logistic regression model to predict whether or not the shot will result in a goal based on the shot distance and angle.

  2. Build a classification model to predict the outcome based on the spatial x,y coordinates of the shot.

  3. Create a visualization displaying the joint frequency of shot locations. Do there appear to be any clear modes of frequently taken shots? Create a conditional version of this display by shot outcome. Does the distribution shape vary by shot outcome? (You can also perform a similar analysis by team).

References

Morse D (2023). hockeyR: Collect and Clean Hockey Stats. R package version 1.3.1, https://github.com/danmorse314/hockeyR.