Python sklearn.metrics 提供了很多任务的评价指标,如分类任务的混淆矩阵、平均分类精度、每类分类精度、总体分类精度、F1-score 等;以及回归任务、聚类任务等多种内置函数.
1. 分类 - 混淆矩阵 Confusion Matrix
from sklearn.metrics import confusion_matrix
计算混淆矩阵,以估计分类精度.
记混淆矩阵 ${ C }$,混淆矩阵元素 ${ C_{ij} }$ 为 gt_label=i , pred_label=j 的元素个数,i,j 为类别 labels.
二值分类中, true negatives 数为 ${ C_{0,0} }$,false negatives 数为 ${ C_{1,0} }$,true positives 数为 ${ C_{1,1} }$,false negatives 数为 ${ C_{0,1} }$.
使用示例:
C = confusion_matrix(gt_labels, pred_labels, labels=None, sample_weight=None)[source]
# C 为 n_classes x n_classes 的混淆矩阵
其中,
[1] - gt_labels - Groundtruth label 值
[2] - pred_labels - 分类器预测的 label 值
[3] - labels - labels 列表,用于索引混淆矩阵
示例1:
from sklearn.metrics import confusion_matrix
gt_labels = [2, 0, 2, 2, 0, 1]
pred_labels = [0, 0, 2, 2, 0, 2]
confusion_matrix(gt_labels, pred_labels)
# array([[2, 0, 0],
# [0, 0, 1],
# [1, 0, 2]])
示例2:
from sklearn.metrics import confusion_matrix
gt_labels = ["cat", "ant", "cat", "cat", "ant", "bird"]
pred_labels = ["ant", "ant", "cat", "cat", "ant", "cat"]
confusion_matrix(y_true, y_pred, labels=["ant", "bird", "cat"])
# array([[2, 0, 0],
# [0, 0, 1],
# [1, 0, 2]])
示例3:
二值分类情况,
from sklearn.metrics import confusion_matrix
tn, fp, fn, tp = confusion_matrix([0, 1, 0, 1], [1, 1, 1, 0]).ravel()
#(tn, fp, fn, tp)
#(0, 2, 1, 1)