互信息,MI,Mutual Information,是用于评价相同数据的两个标签之间的相似性度量. 其公式如:
$$ MI(U, V) = \sum_{i=1}^{|U|} \sum _{j=1}^{|V|} \frac{|U_i \cap V_j|}{N} log \frac{N|U_i \cap V_j|}{|U_i||V_j|} $$
其中,$|U_i|$ 是聚类簇 $U_i$ 中的样本数;$|V_j|$ 是聚类簇 $V_j$ 中的样本数.
MI 是与标签的绝对值无关的:类别或聚类簇标签值的排列方式不会改变 MI 结果.
MI 还具有对称性.
MI 常用的两种形式为,归一化互信息(NMI, Normalized Mutual Information) 和可调整互信息(AMI,Adjusted Mutual Information). 其中,NMI 在论文中更为常用.
1. NMI - sklearn
from sklearn.metrics.cluster import normalized_mutual_info_score
#
c1 = [0, 0, 1, 1]
c2 = [0, 0, 1, 1]
nmi = normalized_mutual_info_score(c1, c2)
print('[INFO]NMI: ', nmi)
# 1.0
2. AMI - sklearn
$$ AMI(U, V) = \frac{MI(U, V) - E(MI(U, V))}{avg(H(U), H(V)) - E(MI(U, V))} $$
from sklearn.metrics.cluster import adjusted_mutual_info_score
#
c1 = [0, 0, 1, 1]
c2 = [0, 0, 1, 1]
ami = adjusted_mutual_info_score(c1, c2)
print('[INFO]AMI: ', ami)
# 1.0