Noise: A Flaw in Human Judgment

I started reading Noise, an essay jointly signed by Daniel Kahneman, Olivier Sibony and Cass R. Sunstein. I have already read Kahneman’s Thinking, Fast and Slow, while I plan to read something written by the other two authors, but they are readings that will take place at an undefined time in my future.

With Thinking, Fast and Slow, I reflected on cognitive biases. Kahneman gives numerous examples, explains the mechanism by which bias creep into our brain and condition our decisions. He talks about economics and a number of other situations in our reality, he never talks about figure skating. In general he talks very little about sports, if I remember correctly there is a mention of baseball, maybe another mention of some sport elsewhere, and that’s it. But he doesn’t need to talk about figure skating directly. If we understand how certain psychological mechanisms work, we can apply them to other contexts as well. I did it with figure skating, here (in Italian), here and here (again in Italian), and I mentioned the books in several other posts.

This book, Noise (I’m reading the Italian translation, Rumore), is a step further. For now I have read the first two chapters, and already I would like to mention numerous passages. Obviously I can’t do it, there is the copyright, and if anything can be inserted within a critical text, it is always a fine line to walk on. Therefore, I refer you to the preview published by Amazon. Read the preview thinking about figure skating.

https://www.amazon.com/Noise-Human-Judgment-Daniel-Kahneman/

In the Introduction the authors explain that human judgments, of any kind, are influenced by noise, a problem that they will analyze in detail in the following 400+ pages. According to them, one way to reduce the noise problem is to rely on something objective rather than on human judgment. They talk about “rules, formulas, and algorithms over humans” (p.8), we could talk about the use of technologies, because the problem of discretion is enormous.

A very concrete example is in Chapter 1. The situation mentioned in the book, that relating to criminal offenses, is certainly much more serious than the evaluations in a figure skating competition, but even in figure skating, the lives of many people are at stake.. The judges, with their marks, assign (or do not assign) medals, or positions of prestige. Skaters may lose, or gain, sponsorship based on the results. For some of them the difference could be huge. And it is important to remember that if one skater is favored by incorrect application of the rules, someone else is disadvantaged.

I did an example of a very arbitrarily applied rule by judges in this post, where the judges lowered the PCS score to Hanyu, believing he had made a serious mistake, when worse mistakes made by other skaters weren’t considered serious mistakes. And this didn’t just happen in different competitions, it also happened within the same competition, the Olympic Games, when a popped jump by Hanyu was judged a serious mistake, but popped jumps by Chen (4T+1Eu+1F), Kagiyama (4T+1Eu+2S) and Uno (3A+1Eu+1F) were not considered serious mistakes, just as a step out made by Kagiyama or a hand on the ice in the middle of a combination (4T+3T, SP) of Uno.

Discretion must be reduced as much as possible because it leads to “inexplicable variations” (p. 15) “in sentencing” in the case of Noise, in scoring in the case of figure skating, and inexplicable variations lead to injustice. And, continuing to recall what happened with American justice, the authors speak of the “use of” computers as an aid”” to have more correct evaluations. This was in the 70s. It was a different context, and the assimilation of these ideas was partial and difficult, but if the problem was known over forty years ago, how is it that no one in the ISU noticed it? How is it that nobody has noticed, despite the fact that a (too small) part of the press and the public are asking for the introduction of better technologies and the reduction of the subjectivity of judgment? The book talks about the introduction of guidelines, with the possibility for judges to make justified exceptions, and the fact that the guidelines – which were mandatory, not optional – should be followed, and that the judges should justify their work.

I skip all the explanations on the evolution of the law – still interesting things, which I invite you to read, if you have not already done so, because not everyone has liked the guidelines, and I move to the end of the chapter. The authors recall that judging is difficult, a difficulty that is found in all human situations in which a judgment is necessary (and therefore also in figure skating), that there is disagreement among the judges, that this disagreement is greater than that that one might believe without serious analysis that injustices arise from this disagreement, but also that it is possible to reduce the problem.

But we must first recognize that there is a problem, instead of just ignoring or denying it, think about how to intervene and then act. All difficult things. Moreover, if on justice there is at least the awareness that there should be greater uniformity of judgment, in figure skating it is easy to hide behind personal tastes.

That different people have different tastes is normal, “But diversity of tastes can hep account of errors if a personal taste is mistaken for a professional judgment” (pp. 27-28). I have said this on other occasions: if I review a book for FantasyMagazine, I behave like a critic and try to evaluate the positive and negative aspects of the book. If I rate on Goodreads, I behave like a reader, and I award stars based on how much I enjoyed the reading. Sometimes the two judgments do not coincide. I do my best to separate the objective evaluation from my tastes (I don’t know if I’m able to do it, but at least I try), I suspect that some people don’t even realize that there is a difference between an objective evaluation and their tastes, and this gives ample space to the discretion of the judges and distorts the results of the competitions.

Not only. Many point to the presence of nine judges, with the elimination of the highest mark and the lowest mark, the guarantee that the final evaluation is correct. I remind you that all judges can make mistakes, and if you don’t believe me I invite you to check out Nathan Chen’s free skate in the Grand Prix Final 2019, when, in a choreo sequence that contains a stumble, he was awarded two +4 and seven +5. With a stumble at least bullet 3 is missing, effortless throughout, so the starting marks cannot be higher than +3 (assuming there are at least three bullets), and from here at least a -1 must be removed (but you can go up to a -3), therefore the final marks cannot be higher than +2, not if the rules are respected. So all judges can make a mistake, and in this case the elimination of two marks did not eliminate the mistake. But, as the authors of Noise remind us, “In noisy systems, errors do not cancel out. They add up” (p. 29).

How does this apply to figure skating? I think we all agree that the judging system is complicated, and the more complicated a judging system is, the more it is subject to influences of all kinds. In this case one kind of noise is the correct execution of the technical elements. The technical elements have a base value and a GOE. The BV is assigned automatically when the technical panel identifies the element (and we would also have to discuss the identification, and by this I refer to the completeness of the rotations and the levels of spins and step sequences, but for now I go over), the GOE is awarded by the judges. The GOE of one element is different from the GOE of another. A skater can do one element well and miss another, things are not correlated with each other. And, beyond the presence of a serious error, the PCS should not be affected by whether the technical elements were done correctly or not. Yet the judges are influenced.

After Hanyu missed the salchow in the Beijing SP, five judges awarded +4 to a perfect 4T+3T combination, one awarded +3. This is the noise. Hanyu missed a jump, the judges convinced themselves he was on a bad day, and lowered all of his scores. Chen has landed all the jumps, and two of his quads are difficult, (let’s overlook the fact that a quadruple salchow preceded by a sequence of steps as Hanyu typically does, is more difficult than a quadruple lutz preceded by a 10 second run), so there is already a background noise that prompts the judges to give him higher GOEs, although the difficulty of the element should only be recognized by the BV and not the GOE, whose job is to tell how the elements were performed. Consequence? All of Chen’s marks have been raised. The two mistakes added up (not only with Chen, also Kagiyama and Uno saw their marks raised), and Hanyu was impressively penalized.

What does this mean? That we cannot trust the judges’ scores. The result of the Men’s competition in Beijing was conditioned by a series of cognitive biases and technical errors that distorted the result. Not everything can be solved, but the introduction of objective evaluation systems would help to have more correct results, together with the simplification of the rules and the obligation, for all aspiring judges and also for those already qualified, to take an exam on cognitive bias.

I don’t know what else I will find in the pages of Noise, but I have a feeling it will be a very interesting read.

This entry was posted in pattinaggio and tagged . Bookmark the permalink.

Leave a ReplyCancel reply