National Basketball Association Player Statistics

clustering
principal component analysis
NBA player statistics from the 2021-2022 regular season
Author

Ron Yurko

Published

March 25, 2023

Motivation

The National Basketball Association (NBA) is the top men’s professional basketball league in the world. While players have predefined positions, the sport is becoming increasingly positionless - with centers attempting more three point shots and guards driving the ball inside to dunk. With this dataset, you can explore clustering NBA players based on various types of statistics and compare your players labels to the predefined positions.

Data

This dataset contains statistics about 812 player-team stints for during the 2021-2022 NBA regular season. For players that played for \(T\) teams during the season (due to trade), there are \(T+1\) rows with one row for their performance with each of the \(T\) teams and another row indicating their total performance (where tm = TOT) across the full season regardless of team. The counting stats are reported on a per 100 team possessions scale, to normalize for playing time differences.

The data was collected using the ballr package in R., which gathers data from basketball-reference.com.

nba-player-stats.csv
Variable Description
player Name of player
pos Player’s designated position
age Player’s age on February 1st of the season
tm Name of team
g Number of games
gs Number of games started
mp Number of minutes played
fg Field goals per 100 team possessions
fga Field goal attempts per 100 team possessions
fgpercent Field goal percentage
x3p 3 point field goals per 100 team possessions
x3pa 3 point field goal attempts per 100 team possessions
x3ppercent 3 point field goal percentage
x2p 2 point field goals per 100 team possessions
x2pa 2 point field goal attempts per 100 team possessions
x2ppercent 2 point field goal percentage
ft Free throws per 100 team possessions
fta Free throw attempts per 100 team possessions
ftpercent Free throw percentage
orb Offensive rebounds per 100 team possessions
drb Defensive rebounds per 100 team possessions
trb Total rebounds per 100 team possessions
ast Assists per 100 team possessions
stl Steals per 100 team possessions
blk Blocks per 100 team possessions
tov Turnovers per 100 team possessions
pf Personal fouls per 100 team possessions
pts Points per 100 team possessions
ortg Offensive Rating - an estimate of points produced per 100 possessions scale
drtg Defensive Rating - an estimate of points allowed per 100 possessions scale

Questions

  1. Apply a clustering technique using the player statistics to create new labels for the types of NBA players, and compare to their known position labels. Compare and contrast your data-defined roles to the traditional basketball positions.

  2. Using principal component analysis, project the players into a lower dimensional space. Choose an appropriate number of principal components, and describe which statistics contribute to your selected number of components. Based on your results, which player statistics explain the most variation between players in the dataset? Which stats are independent of each other?

References

Elmore R (2020). ballr: Access to Current and Historical Basketball Data. R package version 0.2.6.