NASCAR Driver Ratings: 2007 - 2022
Data visualization
Data wrangling
Linear regression
Transformations in regression
Linear regression conditions
Multiple regression
NASCAR Driver Ratings and statistics from 2007-2022
Data
The data comes from the NASCAR website and shows the season statistics from 2007-2022. Each row displays the metrics of a racer for that specific year. The data frame contains 1111 rows of observations and 20 variables (and a row number column).
| Variable | Description |
|---|---|
| …1 | Row number from the original dataset (can be ignored or used as an index) |
| Driver | Name of the driver for the given season |
| Wins | The sum of the driver’s victories |
| AvgStart | The sum of the driver’s starting positions divided by the number of races |
| AvgMidRace | The sum of the driver’s mid race positions divided by the number of races |
| AvgFinish | The sum of the driver’s finishing positions divided by the number of races |
| AvgPos | The sum of the driver’s position each lap divided by the number of laps |
| PassDiff | The sum of green flag passes minus green times passed |
| GreenFlagPasses | Number of green flag passes performed by the driver |
| GreenFlagPassed | Number of times driver is passed during green flag |
| QualityPasses | Number of passes in the top 15 while under green flag conditions by driver |
| PercentQualityPasses | The sum of quality passes divided by green flag passes |
| NumFastestLaps | Number of where the driver had the fastest speed on the lap |
| LapsInTop15 | Number of laps completed while running in a top 15 position |
| PercentLapsInTop15 | The sum of the laps run in the top 15 divided by total laps completed |
| LapsLed | The sum of the laps led in a race |
| PercentLapsLed | The sum of the laps led in the race |
| TotalLaps | The sum of the laps completed by a driver that year |
| DriverRating | Formula combining wins, finish, top15-finish, average running position while on lead lap, average speed under green, fastest lap, led most laps, and lead lap finish with a maximum rating of 150 points |
| Points | Total number of championship points earned by the driver in that season |
| Year | Year in which the season’s statistics were recorded |
Questions
What is the best way to model the relationship between average finish and driver rating? Are the linear model assumptions met?
Which predictors best predict driver rating?