National Women’s Soccer League Team Statistics
Motivation
The National Women’s Soccer League (NWSL) is the top professional women’s soccer league in the United States. While a team’s record ultimately determines their ranking, goal differential (goals scored - goals conceded) is often a better indicator of a team’s ability. But what aspects of a team’s performance are related to their goal differential? The NWSL records a variety of statistics describing a team’s performance, such as the percentage of time they maintain possession, percentage of shots on target, etc. With this dataset, you can explore variation between teams as well as which statistics are relevant predictor variables of goal differential.
Data
This dataset contains statistics about the regular season performance for each NWSL team from 2016 to 2022 (excluding 2020 which was cancelled due to COVID).
The data was collected using the nwslR
package in R
.
Variable | Description |
---|---|
team_name | Name of NWSL team |
season | Regular season year of team’s statistics |
games_played | Number of games team played in season |
goal_differential | Goals scored - goals conceded |
goals | Number of goals scores |
goals_conceded | Number of goals conceded |
cross_accuracy | Percent of crosses that were successful |
goal_conversion_pct | Percent of shots scored |
pass_pct | Pass accuracy |
pass_pct_opposition_half | Pass accuracy in opposition half |
possession_pct | Percentage of overall ball possession the team had during the season |
shot_accuracy | Percentage of shots on target |
tackle_success_pct | Percent of successful tackles |
Questions
Build a linear regression model to predict goal differential as a function of the team’s statistics. Which variables, if any, are predictive of a team’s goal differential in a season? Describe their relationships.
Using principal component analysis, project the team seasons into a lower dimensional space using only the team statistics that do not include goals scored or conceded. Choose an appropriate number of principal components, and describe which statistics contribute to your selected number of components. Based on your results, which team statistics explain the most variation between team-seasons in the dataset? Which team statistics are independent of each other?
References
Dror A (2023). nwslR: Compiles dataset for the National Women’s Soccer League (NWSL). R package version 0.0.0.9001.