Skip to content
Glossary

F1 Score

A measure that combines precision and recall into a single metric, providing a balanced view of model performance.
Definition

The F1 Score is a statistical measure used to evaluate the performance of binary classification models, which are models that distinguish between two classes (e.g., spam vs. non-spam, positive vs. negative). It is the harmonic mean of precision and recall, where precision is the ratio of true positive predictions to the total predicted positives (true positives + false positives), and recall (also known as sensitivity) is the ratio of true positive predictions to the total actual positives (true positives + false negatives). The F1 Score ranges from 0 to 1, where a score of 1 indicates perfect precision and recall, and a score of 0 indicates the worst. The harmonic mean is used in the F1 Score calculation to penalize extreme values, ensuring that both precision and recall are taken into account. This makes the F1 Score a more robust performance metric, especially in situations where there is an imbalance between the positive and negative classes.

Examples/Use Cases:

In information retrieval, the F1 Score can be used to evaluate the effectiveness of a search algorithm. For instance, when searching for relevant documents in a database, precision would measure how many of the retrieved documents are relevant, while recall measures how many relevant documents were retrieved out of all available relevant documents. The F1 Score combines these two aspects to provide a single measure of the search algorithm's effectiveness.

In medical diagnostics, the F1 Score is crucial for evaluating the performance of a test to identify a disease. High precision would mean that most patients diagnosed by the test truly have the disease (few false positives), while high recall would mean that the test identifies most patients with the disease (few false negatives). The F1 Score ensures that both the test's ability to correctly identify patients with the disease and its ability to exclude healthy individuals are considered, which is particularly important in medical tests where the cost of false negatives and false positives can be high.

The F1 Score is widely used in machine learning and data science as it provides a more comprehensive evaluation of model performance than looking at precision and recall independently, especially in datasets where class imbalance is present.

Related Terms
← Back to Glossary

Need human evaluators for your AI research? Scale annotation with expert AI Trainers.