In this post we will dicuss two common metrics:
AUC Score and Average Precision.
In part 2, we will discuss these topics in this post:
What is Average Precision?
Why use Average Precision?
Different name for Average Precision
Average Precision (AP), similar to AUC Score, measures the area under a curve. Yet, instead of measuring area under the ROC curve like AUC score, AP measures the area under the Precision-Recall (PR) curve.
The two components of AP, precision and recall, are calculated as:
Recall = TPR = TP / (TP + FN)
Precision = TP / (FP + TP)
The following pictures are examples of the PR curves and their area under curve:
For a balanced dataset, a random classifier has an AP of 0.5, same as AUC. For imbalanced dataset, the baseline random classifier will have an AP = Number of Positive Class/ Number of Negative Class.
A skilled classifier, on the other hand, has a high precision and a high recall globally across different decision thresholds. Therefore, a model that has a PR curve closer to the top right corner is preferred.
Why Average Precision?
The difference between AUC and AP solely lies in the replacement of AUC’s FPR component with AP’s Precision component. Below recaps the two components of AUC (left) and AP (right):
AUC Score:
TPR = TP / (TP + FN)
FPR = FP / (FP + TN)
Average Precision:
Recall = TPR = TP / (TP + FN)
Precision = TP / (FP + TP)
As shown in the contrast between the two pairs of formula, AP does not include TN in its calculation.
It means that AP is not concerned about True Negative (ie classifying unburned as unburned).
While AUC’s FPR component tries to maximize TN for a fixed number of FP, AP’s Precision component tries to minimize FP for a fixed number of TP.
Therefore, for an imbalanced dataset like in the example given, AP is more aligned with our goal of classifying the positive class, the burned area.
Different Name for Average Precision
Average Precision uses rectangular approximation to calculate the area under the Precision-Recall curve. According to Machine Learning library sklearn’s documentation, Average Precision is calculated as:
Where Pn and Rn are the precision and recall at the N-th threshold.
Besides the use of rectangular approximation like AP, other common methods to calculate the area under PR curve are lower trapezoid estimator and interpolated median estimator.
Depending on the calculation method used, the metric that measures the area under the PR curve may have different names (eg. AUCPR).
REFERENCES:
Wikipedia. (n.d.). Receiver Operating Characteristic. Retrieved from Wikipedia: https://en.wikipedia.org/wiki/Receiver_operating_characteristic
Flatley, M. (2021). AUROC: Area Under the Receiver Operating Characteristic. Retrieved from Morioh: https://morioh.com/p/189aefce710f
Draelos, R. (2019, February 23). Measuring Performance: AUC(AUROC). Retrieved from Glass Box Machine Learning and Medicine: https://glassboxmedicine.com/2019/02/23/measuring-performance-auc-auroc/?ref=morioh.com&utm_source=morioh.com
Draelos, R. (2019, March 2). Measuring Performance: AUCPR and Average Precision. Retrieved from Glass Box Machine Learning and Medicine: https://glassboxmedicine.com/2019/03/02/measuring-performance-auprc/
Brownlee, J. (2020, January 6). ROC Curves and Precision-Recall Curves for Imbalanced Classification. Retrieved from Machine Learning Mastery: https://machinelearningmastery.com/roc-curves-and-precision-recall-curves-for-imbalanced-classification/
Steen, D. (2020, September 19). Precision-Recall Curves. Retrieved from Medium: https://medium.com/@douglaspsteen/precision-recall-curves-d32e5b290248
Chou, S.-Y. (2020, April 25). Compute the AUC of Precision-Recall Curve. Retrieved from Github: https://sinyi-chou.github.io/python-sklearn-precision-recall/
SKlearn developer. (2022). Average Precision Score. Retrieved from sklearn: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.average_precision_score.html