Formula One 2018: Clustering Qualifying Results

Introduction

In this blog post we are going to use a clustering algorithm to group Formula One (F1) teams based on their qualifying results in 2018. While it is relatively easy to read qualifying results, it is not as simple to get a clear picture of the relative performance between teams. Clustering gives us a way to track the performance of teams over time and analyze patterns.

Dataset

The data used in this blog post was downloaded from the Ergast Developer API. We are using the qualifying results for the first nine races of the current (2018) season.

Methodology

The data used for the teams is the average of the best time for the two teammates posted over the full qualifying session, regardless of whether it was set in Q1, Q2, or Q3. When a driver did not set a time (e.g., Verstappen at Monaco) we just omit its record.

Teams will be divided into three groups. While the clustering algorithm does not make any assumption other than separate the groups as much as possible, we will name the clusters as follows: front-runners (1), midfield (2), back of the pack (3).

The algorithm used is K-means with a cluster size of three. K-means creates a specified number of clusters minimizing the within cluster variance. The resulting cluster averages (also known as centroids) are representative lap times for every goup. These are not times that exist in the dataset but they represent a compromise between lap times within the same group.

2018 Australian Grand Prix

In this section we are going to look at the Australian Grand Prix in detail and then apply the same methodology to the other qualifying sessions in the 2018 season.

The table below shows for every F1 team:

Best Time: average of the best lap times posted by the two drivers
Cluster: cluster the team belongs to
Cluster Time: within-cluster average
Gap to Cluster: gap between the best lap time and the cluster time (a negative gap means faster than the cluster time)

Team	Best Time	Cluster	Cluster Time	Gap To Cluster
Mercedes	1:21.626	1	1:21.825	-0.199
Ferrari	1:21.832	1	1:21.825	0.008
Red Bull	1:22.015	1	1:21.825	0.191
Haas F1 Team	1:23.263	2	1:23.428	-0.165
Renault	1:23.296	2	1:23.428	-0.132
McLaren	1:23.724	2	1:23.428	0.297
Force India	1:24.253	3	1:24.584	-0.331
Williams	1:24.575	3	1:24.584	-0.009
Sauber	1:24.596	3	1:24.584	0.011
Toro Rosso	1:24.913	3	1:24.584	0.329

In Australia, the front-runners were Mercedes, Ferrari, and Red Bull. Three teams (Haas, Renault, and McLaren) represented the mid-field and then the remaining four teams (Force India, Williams, McLaren, and Toro Rosso) were clustered together to form the back of the pack.

The cluster times are separated by at least one second and the gaps to the cluster centers are always within four tenths. This is exactly what we want to see which means that the clustering algorithm did a decent job.

Overall Results

The following table shows the cluster groupings for all races in the 2018 season so far and the sum of the clusters per team. A lower number means a better overall cluster placement, where the best possible result is to be always part of the frontrunners.

Clusters

Team	AUS	BAH	CHI	AZE	SPA	MON	CAN	FRA	AUT	Total
Mercedes	1	1	1	1	1	1	1	1	1	9
Ferrari	1	1	1	1	1	1	1	1	1	9
Red Bull	1	1	1	1	1	1	1	1	2	10
Haas F1 Team	2	2	2	2	2	3	2	2	2	19
Renault	2	2	2	2	2	2	2	2	3	19
Force India	3	2	2	1	2	2	2	2	3	19
McLaren	2	2	2	2	2	2	2	3	3	20
Toro Rosso	3	2	3	3	2	3	2	3	3	24
Sauber	3	3	3	2	3	3	3	2	3	25
Williams	3	3	3	2	3	3	3	3	3	26

Front Runners

Mercedes and Ferrari were able to achieve the perfect score (always front runners).
Red Bull closely matched the first two except for the Austrian GP where it was clustered as mid-field.
In the Austrian GP, a not so great day for Red Bull and an incredible performance for Haas caused the two teams to be clustered together as mid-field while the remaining six teams were labeled back of the pack.
Force India was the only other team to join the front runners, which happenened in Baku. On that occasion Sergio Perez managed to get a third place and the only podium of a non Mercedes, Ferrari, or Red Bull driver.

Midfield and Back of the Pack

Renault and Haas were in the mid-field in all occasions except for one, while all other teams (excluding the top three) were in the back runner group at least twice.
Toro Rosso’s performance in Baku gave it its own back of the pack cluster allowing Sauber and Williams to join the midfield for the first time in the season. Sauber also joined the midfield in France where Charles Leclerc qualified eigth.
McLaren joined the back of the pack in the French GP, which was definitely their worst performance of the year.

Overall Results

Looking at the overall results by clustering the totals in the clusters table, these are the three clusters:

Front-runners: Mercedes, Ferrari, and Red Bull
Midfield: Haas, Renault, Force India, and McLaren
Back of the pack: Toro Rosso, Sauber, and Williams

Gaps

The following table shows in detail the gaps to the cluster time and the lap times for all teams and races.

Team	AUS	BAH	CHI	AZE	SPA	MON	CAN	FRA	AUT
Mercedes	-0.199 1:21.626	-0.191 1:28.171	0.097 1:31.650	-0.303 1:41.757	-0.297 1:16.192	0.237 1:11.336	-0.034 1:10.926	-0.403 1:30.087	-0.211 1:03.139
Ferrari	0.008 1:21.832	-0.333 1:28.029	-0.415 1:31.138	-0.066 1:41.993	-0.031 1:16.458	0.053 1:11.152	-0.031 1:10.929	0.095 1:30.586	0.211 1:03.561
Red Bull	0.191 1:22.015	0.523 1:28.885	0.319 1:31.871	-0.107 1:41.952	0.328 1:16.817	-0.29 1:10.810	0.066 1:11.026	0.309 1:30.799	-0.027 1:03.918
Haas F1 Team	-0.165 1:23.263	0.041 1:29.944	-0.167 1:32.754	-0.229 1:43.674	-0.585 1:17.658	0.142 1:13.060	0.1 1:12.605	-0.572 1:31.490	0.027 1:03.971
Renault	-0.132 1:23.296	-0.409 1:29.494	-0.265 1:32.656	-0.694 1:43.208	0.113 1:18.356	0.053 1:12.270	-0.499 1:12.006	0.003 1:32.065	-0.454 1:04.618
McLaren	0.297 1:23.724	0.442 1:30.345	0.447 1:33.368	0.347 1:44.249	-0.222 1:18.021	0.057 1:12.275	0.355 1:12.860	-0.094 1:33.069	0.046 1:05.117
Force India	-0.331 1:24.253	0.112 1:30.015	-0.014 1:32.907	0.475 1:42.534	0.474 1:18.717	-0.11 1:12.107	-0.266 1:12.239	0.202 1:32.264	-0.01 1:05.061
Williams	-0.009 1:24.575	0.109 1:31.458	-0.095 1:34.173	-0.167 1:43.735	0.442 1:19.959	0.004 1:12.921	-0.005 1:13.616	0.52 1:33.682	0.221 1:05.293
Sauber	0.011 1:24.596	-0.108 1:31.241	0.415 1:34.683	0.744 1:44.646	-0.441 1:19.076	0.071 1:12.989	0.005 1:13.627	0.367 1:32.429	0.151 1:05.222
Toro Rosso	0.329 1:24.913	-0.186 1:29.716	-0.32 1:33.948	0 1:50.924	0.219 1:18.462	-0.218 1:12.700	0.311 1:12.817	-0.427 1:32.736	0.048 1:05.120

Conclusion

The use of clustering over qualifying results helps creating a simplified view of the results. By grouping teams we don’t have to remember the exact lap times or gaps and we can instead concentrate on the clusters.

As far as findings it was not surprising to find the top three teams in their own cluster, but it was interesting to find out for example that Force India in Azerbaijan had the most remarkable qualifying of the non top three teams or that Williams had their only midfield result in Azerbaijan.

Using three clusters might have been a bit limiting. There are races such as the Austrian GP where there is a 2-2-6 split, which means the back of the pack is very crowded. A cluster size of four would probably be a better fit but a size of three allowed very simple naming and explanations.

Did the cluster information help understanding the overall relative standing of teams? Were any patterns missed in the analysis of the Clusters or Gaps tables? Feel free to leave feedback in the comments section below.