National Basketball Association Player Statistics
Motivation
The National Basketball Association (NBA) is the top men’s professional basketball league in the world. While players have predefined positions, the sport is becoming increasingly positionless - with centers attempting more three point shots and guards driving the ball inside to dunk. With this dataset, you can explore clustering NBA players based on various types of statistics and compare your players labels to the predefined positions.
Data
This dataset contains statistics about 812 player-team stints for during the 2021-2022 NBA regular season. For players that played for \(T\) teams during the season (due to trade), there are \(T+1\) rows with one row for their performance with each of the \(T\) teams and another row indicating their total performance (where tm = TOT
) across the full season regardless of team. The counting stats are reported on a per 100 team possessions scale, to normalize for playing time differences.
The data was collected using the ballr
package in R
., which gathers data from basketball-reference.com.
Variable | Description |
---|---|
player | Name of player |
pos | Player’s designated position |
age | Player’s age on February 1st of the season |
tm | Name of team |
g | Number of games |
gs | Number of games started |
mp | Number of minutes played |
fg | Field goals per 100 team possessions |
fga | Field goal attempts per 100 team possessions |
fgpercent | Field goal percentage |
x3p | 3 point field goals per 100 team possessions |
x3pa | 3 point field goal attempts per 100 team possessions |
x3ppercent | 3 point field goal percentage |
x2p | 2 point field goals per 100 team possessions |
x2pa | 2 point field goal attempts per 100 team possessions |
x2ppercent | 2 point field goal percentage |
ft | Free throws per 100 team possessions |
fta | Free throw attempts per 100 team possessions |
ftpercent | Free throw percentage |
orb | Offensive rebounds per 100 team possessions |
drb | Defensive rebounds per 100 team possessions |
trb | Total rebounds per 100 team possessions |
ast | Assists per 100 team possessions |
stl | Steals per 100 team possessions |
blk | Blocks per 100 team possessions |
tov | Turnovers per 100 team possessions |
pf | Personal fouls per 100 team possessions |
pts | Points per 100 team possessions |
ortg | Offensive Rating - an estimate of points produced per 100 possessions scale |
drtg | Defensive Rating - an estimate of points allowed per 100 possessions scale |
Questions
Apply a clustering technique using the player statistics to create new labels for the types of NBA players, and compare to their known position labels. Compare and contrast your data-defined roles to the traditional basketball positions.
Using principal component analysis, project the players into a lower dimensional space. Choose an appropriate number of principal components, and describe which statistics contribute to your selected number of components. Based on your results, which player statistics explain the most variation between players in the dataset? Which stats are independent of each other?
References
Elmore R (2020). ballr: Access to Current and Historical Basketball Data. R package version 0.2.6.