分析训练过程中学习率, trainloss, testloss 等变化情况,有助于理解网络模型训练状态.

在采用 shell 脚本进行 caffe 训练时,可以输出训练过程到log 文件,如

$CAFFE_ROOT/build/tools/caffe train \
    --solver=solver.prototxt \
    --weights=pretrained.caffemodel \
    --gpu 0 \
    2>&1 | tee train.log

Caffe 提供了对输出 log 文件的解析工具 - parse_log.py:

$CAFFE_ROOT/tools/extra/parse_log.py train.log ./

输出两个解析文件:

train.log.train train.log.test

其内容格式如:

NumIters,Seconds,LearningRate,loss
0.0,0.366678,0.05,4.30619
10.0,3.210073,0.05,2.73271
20.0,6.03005,0.05,8.48341
......
NumIters,Seconds,LearningRate,acc/top-1,acc/top-5,loss
7000.0,2266.206901,0.05,0.240812,0.591906,2.67359
14000.0,4538.298707,0.05,0.42175,0.780375,2.01819
21000.0,6798.336418,0.05,0.491844,0.832719,1.7494
......

根据解析的结果,即可绘制 train loss,test loss 和 accuracy 的变化曲线,如:

#
import pandas as pd
import matplotlib.pyplot as plt

train_log = pd.read_csv("train.log.train")
test_log = pd.read_csv("train.log.test")

_, ax1 = plt.subplots()
ax1.set_title("train loss and test loss")
ax1.plot(train_log["NumIters"], train_log["loss"], alpha=0.5)
ax1.plot(test_log["NumIters"], test_log["loss"], 'g')
ax1.set_xlabel('iteration')
ax1.set_ylabel('train loss')
plt.legend(loc='upper left')

ax2 = ax1.twinx()
ax2.plot(test_log["NumIters"], test_log["acc/top-1"], 'r')
ax2.plot(test_log["NumIters"], test_log["acc/top-5"], 'm')
ax2.set_ylabel('test accuracy')
plt.legend(loc='upper right')

plt.show()

print 'Done.'

Last modification:October 9th, 2018 at 09:31 am