分析训练过程中学习率, trainloss, testloss 等变化情况,有助于理解网络模型训练状态.

在采用 shell 脚本进行 caffe 训练时,可以输出训练过程到log 文件,如

$CAFFE_ROOT/build/tools/caffe train \ --solver=solver.prototxt \ --weights=pretrained.caffemodel \ --gpu 0 \ 2>&1 | tee train.log

Caffe 提供了对输出 log 文件的解析工具 - parse_log.py:

$CAFFE_ROOT/tools/extra/parse_log.py train.log ./

输出两个解析文件:

train.log.train train.log.test

其内容格式如:

NumIters,Seconds,LearningRate,loss 0.0,0.366678,0.05,4.30619 10.0,3.210073,0.05,2.73271 20.0,6.03005,0.05,8.48341 ......
NumIters,Seconds,LearningRate,acc/top-1,acc/top-5,loss 7000.0,2266.206901,0.05,0.240812,0.591906,2.67359 14000.0,4538.298707,0.05,0.42175,0.780375,2.01819 21000.0,6798.336418,0.05,0.491844,0.832719,1.7494 ......

根据解析的结果,即可绘制 train loss,test loss 和 accuracy 的变化曲线,如:

# import pandas as pd import matplotlib.pyplot as plt train_log = pd.read_csv("train.log.train") test_log = pd.read_csv("train.log.test") _, ax1 = plt.subplots() ax1.set_title("train loss and test loss") ax1.plot(train_log["NumIters"], train_log["loss"], alpha=0.5) ax1.plot(test_log["NumIters"], test_log["loss"], 'g') ax1.set_xlabel('iteration') ax1.set_ylabel('train loss') plt.legend(loc='upper left') ax2 = ax1.twinx() ax2.plot(test_log["NumIters"], test_log["acc/top-1"], 'r') ax2.plot(test_log["NumIters"], test_log["acc/top-5"], 'm') ax2.set_ylabel('test accuracy') plt.legend(loc='upper right') plt.show() print 'Done.'

Last modification:October 9th, 2018 at 09:31 am