This weekend, I’m holding a time trial competition at my office (aka the HPD-LAB). The event features 2 separate time trials. In each time trial, drivers will have 20 minutes to set their fastest lap.
- Tarmac – Brands Hatch Indy in a Formula Ford
- Rallycross – Holjes RX in a Chevy Monza 500EF (FWD)
While everyone has driven Brands Indy before, most have not driven a Formula Ford. I don’t think anyone has driven Holjes in the Monza. These time trials are therefore as much about adapting quickly as driving quickly.
It’s easy to determine who the fastest driver is in each category. Simply sort by the lap time. But when it comes to the title of overall fastest driver, things get complicated.
Sum of lap times is short-sighted
A simple solution would be to add the two lap times together. This might work okay because in both cases the lap times will be under 1 minute. But as a general methodology, adding lap times is a dumb idea. Consider the problem of finding the fastest runner. Athletes who perform well at 100m do poorly at marathons and vice-versa. As marathons times can be separated by minutes, simply summing marathon and 100m times would mean handing the title over to the fastest marathon runner.
Subjectivity is unavoidable
Imagine driving your car from home to the race track. It’s 100 km and it takes you 1 hour. Your speed is 100 km/hr. On the way back, there’s lots of traffic, and it takes 2 hours. Your return speed is 50 km/hr. What is your average speed? One way to calculate this is to take the average of 100 km/hr and 50 km/hr = 75 km/hr. Another way is to divide the total distance by total time: 200 km / 3 hr = 66.7 km/hr.
Which of these methods provides the correct aggregate speed? If you thought this exercise would end in truth, you’re wrong. There are many ways to calculate centrality, and ultimately you get to choose. You can use the arithmetic mean (75.0), harmonic mean (66.7), geometric mean (70.7), or some other calculation as the final value. Again, you get to choose, and that makes everything we do partially subjective.
Transparency is essential
The stated goal is to find the fastest driver. But this can’t be done without some subjectivity. In cases such as these, one must be transparent about how the calculations are made and how the final results depend on the parameters. That way, other people can replicate and extend your study, which may lead to different conclusions based on the exact same data. So let’s take a look at some common ways to aggregate performance before deciding on the overall winner.
Sum of ranks is not that simple
One way to determine the overall winner is to rank each driver in each event and then sum up the ranks. Like golf, a lower score is better. But even this simple scoring scheme has a few nuances. What happens when multiple drivers end with the exact same lap time? Granted, this is unlikely, but in many testing scenarios (e.g. the 13 best summer tires of 2023), it’s possible for multiple contestants to end with the same score in some category. Let’s say the times are 60, 61, 61, and 62 seconds. Do the drivers in the middle get placed 2nd and the 4th fastest driver gets 3rd? Or 4th? Or do the middle two get 3rd? Although I’ve never seen it done, one could average the ranks for drivers who tie.
- No gaps after ties: 1, 2, 2, 3
- Gaps after ties: 1, 2, 2, 4
- Gap before ties: 1, 3, 3, 4
- Average tied ranks: 1, 2.5, 2.5, 4
Point systems are common
Many racing organizations give out points for different placings. These are generally non-linear with more points awarded to the faster drivers. Here are a few.
- Formula 1 historic (early): 8, 6, 4, 3, 2
- Formula 1 historic (later): 10, 6, 4, 3, 2, 1
- Formula 1 historic (even later): 10, 8, 6, 5, 4, 3, 2, 1
- Formula 1 current: 25, 18, 15, 12, 10, 8, 6, 4, 2, 1
- Moto GP and World Superbike current: 25, 20, 15, 13, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1
Points are just arbitrarily-weighted ranks. The point system has no basis in truth. Points are the rules of a game. Is the winner at the end of the season the fastest driver? Hard to say. However, it’s easy to say they are the winner of the points scoring game.
One might ask if the goal of a points-scoring system to actually find the fastest driver. Do we want a system that always finds the best driver or do we want a system that sometimes gives worse drivers a chance at winning the overall title? Uncertainty can create excitement.
Measuring performance isn’t that simple
If you’ve ever watched a bicycle race like Tour de France, you know that stages end with a huge number of riders crossing the finish line at the same time because they are part of the peloton (“ball” in French). Handing out vastly different ranks seems like a stupid thing to do when all of the riders have performed the same. So why not measure performance rather than ranking. As discussed earlier, simply summing up time is stupid. The data has to be normalized in some way to make running 100m equivalent to running a marathon.
- Single-sided scaling: One way to tackle this problem is to give the fastest driver 100% and then rank everyone else relative to this performance. Continuing the previous example, a driver who completes a lap in 60 seconds gets 100%. The two drivers at 61 get 60/61 or 98.36%, and the driver at 62 gets 96.77%. This doesn’t give much of a “win” to the winning driver does it? Feels wrong.
- Double-sided scaling: Another transformation would be to give the top driver 100% and the bottom driver 0% and scale the others in between. This results in 100%, 50%, 50%, 0%. While this certainly spreads out the performance better, it also feels wrong.
- Arbitrary transformations: There’s nothing to prevent you or me from coming up with our own transformations. For example, maybe I like single-sided scaling but I want there to be more differences among the placings. I could square the number. This would give points as 1, 96.7, 96.7, 93.7. One could cube instead of square or use logs and exponents. You’re allowed to come up with whatever transformation you want. However, some transformations will feel better than others.
Decathlon scoring
Olympic Decathlon has been around a while and they have changed their scoring system several times. One of their issues is that they want to simultaneously solve two problems that are at odds with each other. (1) they don’t want specialists to win the whole thing (2) they want to reward athletes at the top end for outstanding performance. The decathlon scoring formula is: INT(A(B — P)C). Here, B is some kind of baseline expectation and P is the performance of the athlete. In some events this is P – B (depending on if lower or higher is better). A and C represent scaling factors to balance points between different events.
There’s nothing wrong with decathlon scoring. It’s the way they play their game. The formula is highly tuned for their specific activity.
Z-scores make sense
One somewhat principled way of measuring and combining performances is with Z-scores. You’re probably familiar with means and standard deviations. The mean, or average, is also known as the expected value. The Z-score is a measure of how far away one is from the mean in units of the standard deviation. A Z-score of +1.0 indicates a performance 1 standard deviation above the mean. A Z-score of 0-.5 indicates a performance one half a standard deviation below the expected value. A person who performs above the mean on some tasks and below on others will have an aggregate Z-score equivalent to a person who performs the mean on all tasks (whose performance is 0, neither above nor below the mean). Summing Z-scores therefore makes some sense as a way to aggregate performance.
In my professional life, I’m a Professor, and one of the things I’m forced to do is grade students on exams.
The university doesn’t pay me to teach, they pay me to grade — Ian
This sounds a bit like a much more famous quote…
I don’t pay prostitutes for sex, I pay them to leave — Charlie Sheen
Comedy break over. My classes typically have several exams. Do I sum the exam scores? Of course not! Some exams are more difficult (have lower means) than others. Scoring 80% on a hard exam should be worth more than on an easy exam. As a result, I calculate individual performances as Z-scores and the aggregate performance as the sum of the Z-scores. Lemon-squeezy.
One issue with Z-scores is that there’s an implicit assumption that the underlying data is normally distributed. Performance distributions are generally not normal. They tend to be extreme value distributions (there is a minimum possible lap time on one side but all manner of suckage on the other). Does this invalidate the use of Z-scores? Not necessarily. However, it creates an internal weighting system that you might want to re-scale similarly to what is done to turn ranks into points. Again, as the creator of your ranking system, you’re allowed to do whatever you like.
To turn Z-scores into 100-point final grades, I end up doing something like 84 + 3.5*sumZ. The actual offset (84) and scale (3.5) are different with every class of students, and sometimes students end up with over 100 points. After I compute the numeric grade, I have a discussion with the teaching assistant and we may round up a few numbers if they are close to the borders. Such decisions depend on “extra credit”. Now you know how the sausage is made in my house. Also, I’m 100% transparent about it with the students.
Fastest driver?
So how am I going to find the overall fastest driver? You’ll have to come back to the next post to find out.