An Intuitive Guide To The F1 Score (2024)

Demystifying a very popular classification metric

An Intuitive Guide To The F1 Score (3)

As a data scientist, I have used the concept of the F1 score extensively, as it is a great way to consider both precision and recall¹ simultaneously for clarification tasks. It is a very popular metric, and is also referred to as the Sorensen-Dice coefficient². But it was never an intuitive average for me.

To refresh our memories, the formula for the F1 score is 2m1*m2/(m1 + m2),where m1 and m2 represent the precision and recall scores³.

To my mind, there are two key properties of the F1 score:

  1. The F1 score, when it is defined, lies between m1 and m2.
  2. The F1 score is never greater than the arithmetic mean of m1 and m2, but is often smaller. (i.e. the F1 score is weighted toward the smaller of m1 and m2). In fact, it is equal to the arithmetic mean only when m1=m2.

These two properties are critical, in my opinion, because without the first, it really could not be thought of as an average at all. Without the second, it would be very unclear why one would use F1 score for classification instead of a typical arithmetic average of precision and recall. With this property, we are enforcing that a classification algorithm be decent at both precision and recall. The algorithm cannot make up for severe deficiency in one by excellence in the other, when the F1 score is used, because the score is weighted toward the lower of the two. This is not the case with the typical arithmetic mean.

Neither of these two important properties exactly leaps out at you from the above definition of theF1score.

In case others are in the same position, I thought I’d provide a clear, intuitive guide to understanding the F1 score.

We begin with a (seemingly) unrelated concept.

A common riddle, which you’ve likely heard, goes as follows:

You are running a10k. If you run the first 5 kilometers at 10 kph (or mph, it really doesn’t matter) and the last 5 kilometers at 15 kph, what is your average speedovertheentirerace?

The typical instinct, honed over many years of mathematical education, is to blurt out “12.5!", which is of course wrong. That’s the arithmetic mean, which would be correct if the race had been split into two halves based on time, not distance.

The correct answer is 10/(5/10 + 5/15) = 12 kph

Very interesting.

The reason the answer is less than 12.5, is that we actually spend more time going at the slower speed, which weights the average toward the slower speed. (To bring this point home, consider what would happen if you ran the second half at 0 kph. You would be running the race forever, and your average speed would have to be put at 0.) This is reflected in the correct answer.

Let’s solve this for a race of arbitrary lengthattwoarbitraryspeeds.

Consider a race of length x, in which we go at speed m1 for the first half of the distance (x/2) and then at speed m2 for the second half. What is our average speed over the entire race? We proceed with the calculation in the below image.

An Intuitive Guide To The F1 Score (4)

In line 1, the numerator is the total distance covered, and the denominator is the time taken. We proceed to line 2 by cross multiplying and moving the 2 to the denominator. Next, we factor out the x, and conclude by cancelling the x and rearranging.

Perhaps you now see the connection between this common riddle and the F1 score. The formula for the average speed simplifies to precisely the formula for the F1 score, where m1 and m2 taken to be precision and recall.

The advantage of using the story of what we might call “the race of two halves" to guide our intuition is that now the two key properties of the F1 score mentioned above emerge naturally. Without resorting to abstract mathematical proofs, these properties are intuitively obvious from the story. Indeed, they could not be otherwise.

Specifically:

  1. The F1 score cannot be larger than the larger of the precision and recall, and it cannot be smaller than the smaller of the two. Thus, it is truly a kind of average. This is clearly true because your average speed across the entire race cannot be faster than your fastest instantaneous speed during the race, nor can it be slower than your slowest instantaneous speed.
  2. The F1 score is closer to the smaller of the precision and recall, than it is to the larger, if indeed they are not equal. Thus the F1 score cannot be greater than the arithmetic mean (and is often smaller). This is obvious because you spend more time going at the slower speed, so the average gets weighted down towards the slower speed. Of course, in the case where both speeds are equal, this point is moot.

It is true that these properties can be formally proven directly from the definition of the F1 score and, for completeness, I will include formal proofs below, but they are so much more intuitive as they emerge from the story.

Okay, time for proofs.

Deep breath.

To begin the proof of the first property, we note that if exactly one of m1 and m2 is 0, then F1=0 and property 1 is trivially true. In the case that m1 and m2 are both 0, then the F1 Score is undefined.

Now we need only concern ourselves with the case where neither m1 nor m2 is 0.

We will use contradiction to prove property 1:

Without loss of generality, let m1m2. Thus, we take m2 to be the larger of the two, if indeed a larger exists.

Then, we assume, to the contrary of what we are trying to prove, that F1 = 2m1*m2/(m1 + m2) > m2.

Since we know m2 is non-zero, we can divide both sides by m2, to yield: 2m1/(m1 + m2) > 1.

Multiplying both sides by (m1+m2) and combining like terms yields: m1 > m2, which contradicts our assumption that m1m2.

Thus, we conclude that F1 ≦ m2.

By an almost identical argument, we can conclude that F1 ≧ m1. We assume that F1 < m1, divide both sides by m1, multiply both sides by (m1+m2) and combine like terms. This yields m2 < m1, which once again contradicts our assumption that m1m2.

Thus, we have shown that F1 must lie between m1 and m2 and the first property is proven.

Further, in the case that m1=m2, it is clear that F1=m1=m2, since F1=2m1*m2/(m1+m2) = 2m1*m2/(m2+m2)=m1=m2.

To prove property 2, we once again use the method of contradiction:

We assume that F1 = 2m1*m2/(m1+m2) > (m1+m2)/2, where (m1+m2)/2 is the arithmetic mean.

This simplifies to 4m1*m2 > (m1+m2)². If we multiply out and combine like terms, we obtain 0 > m1² +m2² - 2m1*m2 = (m1-m2)².

This is impossible since the square of a number cannot be negative. Further, the equality only holds if m1=m2. Thus we have shown that F1 is equal to the arithmetic mean if and only if m1=m2, (which accords with the fact that in this case F1=m1=m2, as mentioned above) and F1 is smaller than the arithmetic mean otherwise, which is property two.

While the above proofs are not particularly difficult, I’m sure you will agree with me that the story provides much needed intuition.

So the next time you encounter the F1 score, think of it as your average speed over a race in which you ran until the halfway mark at a rate equal to the precision, and the second half at a rate equal to the recall.

¹ Precision and recall go by several names. Precision is also known as user’s accuracy. Recall is also known as sensitivity and as producer’s accuracy. Precision and Recall are also related to the concepts of errors of omission and comission. The variety of names these concepts go by is an indication of how many disciplines there are in which these ideas are critical.

² Wikipedia contributors. (2021, September 28). Sørensen–Dice coefficient. In Wikipedia, The Free Encyclopedia. Retrieved 16:58, October 10, 2021, from https://en.wikipedia.org/w/index.php?title=S%C3%B8rensen%E2%80%93Dice_coefficient&oldid=1047086456

³ https://deepai.org/machine-learning-glossary-and-terms/f-score

An Intuitive Guide To The F1 Score (2024)

References

Top Articles
Latest Posts
Article information

Author: Ms. Lucile Johns

Last Updated:

Views: 5711

Rating: 4 / 5 (41 voted)

Reviews: 80% of readers found this page helpful

Author information

Name: Ms. Lucile Johns

Birthday: 1999-11-16

Address: Suite 237 56046 Walsh Coves, West Enid, VT 46557

Phone: +59115435987187

Job: Education Supervisor

Hobby: Genealogy, Stone skipping, Skydiving, Nordic skating, Couponing, Coloring, Gardening

Introduction: My name is Ms. Lucile Johns, I am a successful, friendly, friendly, homely, adventurous, handsome, delightful person who loves writing and wants to share my knowledge and understanding with you.