Evaluating a machine learning model is as much important as building it, if not more. The only way by which we can know that our model draws out accurate conclusions is by evaluating its performance by passing it through a number of criterion and analyzing the results obtained.
This article aims at providing the readers, a brief idea about the various performance evaluation metrics used. However, before moving on to the main article, let’s have a look at the basic terms and definitions that you’ll need to know so that you can understand this article better.
A true positive is an outcome where the model correctly predicts the positive class. Example: A COVID-19 positive person is correctly predicted to be positive.
A true negative is an outcome where the model correctly predicts the negative class. Example: A COVID-19 negative person is correctly predicted to be negative.
A false positive is an outcome where the model incorrectly predicts the positive class. Example: A COVID-19 negative person is incorrectly predicted to be positive.
A false negative is an outcome where the model incorrectly predicts the negative class. Example: A COVID-19 positive person is incorrectly predicted to be negative.
Classification Accuracy, or simply accuracy, is the ratio of number of correct predictions to the total number of input samples.
In terms of the terms we discussed above, accuracy can be described as:
Classification accuracy often leads to misleading results, especially in the case of unbalanced classes, which is when we have different number of samples in different classes.
A confusion matrix is a technique for summarizing the performance of a classification algorithm. It helps in overcoming the limitations that we faced while using classification accuracy. It not only helps in identifying whether the model is appropriate or not, but also tell us the areas in which the model does not perform well so that we can work on those specific areas.
Error rate (ERR) is calculated as the number of incorrect predictions divided by the total number of samples in the dataset. Ideally, it should be 0. Its maximum value is 1.
Recall is the fraction of true events that you have predicted correctly.
In other words when the actual value is true(positive) then how often the predicted value is correct, is recall. It is calculated as the number of correct positive predictions divided by the total number of positives. In an ideal scenario, it should be 1.
Precision is the fraction of predicted positive events that are actually positive.
In other words, it means the probability of when our model is predicted true and it is correct prediction. Ideally, its value should be 1.
If we make the precision as high as possible, i.e., almost equal to 1, the recall of our model would decrease because of the high number of false negatives. For some machine learning models, we need both precision and recall to be balanced with each other. For such a scenario, we calculate another metric called F1 Score.
It is a harmonic mean of precision and recall. It is also known as a weighted average of Precision and Recall.
F1-Score= 2*(Recall * Precision) / (Recall + Precision)
It is also a metric same F1 score only differs in the case that it does not give equal weight to both precision and recall. It allows us to give more weight to either of them.
Specificity (SP) is calculated as the number of correct negative predictions divided by the total number of negatives. It is also called true negative rate (TNR). Ideally, specificity should be 1.
False Positive Rate
False positive rate (FPR) is calculated as the number of incorrect positive predictions divided by the total number of negatives. Its ideal value is 0.
This was all about the basics of the various performance evaluation metrics mainly used in classification. I hope I was able to clear up the basics!
You can find me on LinkedIn at: https://www.linkedin.com/in/manasvita-sharma-b35315171/