National Women’s Soccer League Team Statistics

linear regression
principal component analysis
NWSL team statistics from 2016 to 2022
Author

Ron Yurko

Published

March 18, 2023

Motivation

The National Women’s Soccer League (NWSL) is the top professional women’s soccer league in the United States. While a team’s record ultimately determines their ranking, goal differential (goals scored - goals conceded) is often a better indicator of a team’s ability. But what aspects of a team’s performance are related to their goal differential? The NWSL records a variety of statistics describing a team’s performance, such as the percentage of time they maintain possession, percentage of shots on target, etc. With this dataset, you can explore variation between teams as well as which statistics are relevant predictor variables of goal differential.

Data

This dataset contains statistics about the regular season performance for each NWSL team from 2016 to 2022 (excluding 2020 which was cancelled due to COVID).

The data was collected using the nwslR package in R.

nwsl-team-stats.csv
Variable Description
team_name Name of NWSL team
season Regular season year of team’s statistics
games_played Number of games team played in season
goal_differential Goals scored - goals conceded
goals Number of goals scores
goals_conceded Number of goals conceded
cross_accuracy Percent of crosses that were successful
goal_conversion_pct Percent of shots scored
pass_pct Pass accuracy
pass_pct_opposition_half Pass accuracy in opposition half
possession_pct Percentage of overall ball possession the team had during the season
shot_accuracy Percentage of shots on target
tackle_success_pct Percent of successful tackles

Questions

  1. Build a linear regression model to predict goal differential as a function of the team’s statistics. Which variables, if any, are predictive of a team’s goal differential in a season? Describe their relationships.

  2. Using principal component analysis, project the team seasons into a lower dimensional space using only the team statistics that do not include goals scored or conceded. Choose an appropriate number of principal components, and describe which statistics contribute to your selected number of components. Based on your results, which team statistics explain the most variation between team-seasons in the dataset? Which team statistics are independent of each other?

References

Dror A (2023). nwslR: Compiles dataset for the National Women’s Soccer League (NWSL). R package version 0.0.0.9001.