单标签图片分类是指, 每张图片只能有一个标签, 比如是狗, 或猫.
一般来说的图片分类就是单标签的. 这也是 CNN 成功最初很好解决的问题.
<h2>1. 训练数据制作</h2>
主要是生成 train.txt、 val.txt 和 test.txt 文件,使其内容格式为:
image_name1 label1
image_name2 label2
image_name3 label3
......
如:
img1.jpg 0
img2.jpg 0
img3.jpg 1
img4.jpg 1
img5.jpg 2
img6.jpg 2
......
<h2>2. 生成数据集的 lmdb</h2>
主要是利用 caffe 里自带的 create_imagenet.sh 和 make_imagenet_mean.sh 脚本,主要修改的地方有:
- create_imagenet.sh
#!/usr/bin/env sh
# Create the imagenet lmdb inputs
# N.B. set the path to the imagenet train + val data dirs
set -e
EXAMPLE=/path/to/save/lmdb/ # 生成lmdb的保存路径
DATA=/path/to/tranorval/txt/ # train.txt, val.txt 所在路径
TOOLS=/path/to/caffe/build/tools # caffe 编译后的工具
TRAIN_DATA_ROOT=/path/to/images/train/ # train 图像数据集的根目录
VAL_DATA_ROOT=/path/to/images/val/ # val 图像数据集的根目录
# Set RESIZE=true to resize the images to 256x256. Leave as false if images have
# already been resized using another tool.
RESIZE=false ## 修改为 true,将图片尺寸统一为 256x256
if *RESIZE; then
RESIZE_HEIGHT=256
RESIZE_WIDTH=256
else
RESIZE_HEIGHT=0
RESIZE_WIDTH=0
fi
if [ ! -d "*TRAIN_DATA_ROOT" ]; then
echo "Error: TRAIN_DATA_ROOT is not a path to a directory: *TRAIN_DATA_ROOT"
echo "Set the TRAIN_DATA_ROOT variable in create_imagenet.sh to the path" \
"where the ImageNet training data is stored."
exit 1
fi
if [ ! -d "*VAL_DATA_ROOT" ]; then
echo "Error: VAL_DATA_ROOT is not a path to a directory: *VAL_DATA_ROOT"
echo "Set the VAL_DATA_ROOT variable in create_imagenet.sh to the path" \
"where the ImageNet validation data is stored."
exit 1
fi
echo "Creating train lmdb..."
GLOG_logtostderr=1 *TOOLS/convert_imageset \
--resize_height=*RESIZE_HEIGHT \
--resize_width=*RESIZE_WIDTH \
--shuffle \
*TRAIN_DATA_ROOT \
*DATA/train.txt \
*EXAMPLE/train_lmdb # 生成 train 的 lmdb, 可以修改名字
echo "Creating val lmdb..."
GLOG_logtostderr=1 *TOOLS/convert_imageset \
--resize_height=*RESIZE_HEIGHT \
--resize_width=*RESIZE_WIDTH \
--shuffle \
*VAL_DATA_ROOT \
*DATA/val.txt \
*EXAMPLE/val_lmdb # 生成 val 的 lmdb, 可以修改名字
echo "Done."
- make_imagenet_mean.sh
#!/usr/bin/env sh
# Compute the mean image from the imagenet training lmdb
# N.B. this is available in data/ilsvrc12
EXAMPLE=/path/to/save/lmdb/ # 生成lmdb的保存路径
DATA=/path/to/tranorval/txt/ # train.txt, val.txt 所在路径
TOOLS=/path/to/caffe/build/tools # caffe 编译后的工具
TOOLS/compute_image_mean EXAMPLE/train_lmdb \
*DATA/images_mean.binaryproto # 生成的均值文件,名字可修改
echo "Done."
注: *
替换为 $
符号.
分别运行 create_imagenet.sh 和 make_imagenet_mean.sh, 即可生成train训练数据集和val验证数据集的 lmdb 文件,及对应的均值文件.
将 images_mean.binaryproto 均值文件转换为 image_mean.npy:
import caffe
caffeBlob = caffe.proto.caffe_pb2.BlobProto()
data = open('imeags_mean.binaryproto', 'rb' ).read()
caffeBlob.ParseFromString(data)
array = np.array(caffe.io.blobproto_to_array(caffeBlob))
mean_file = 'imeages_mean.npy'
np.save(mean_file, array[0])
<h2>3. 网络修改与模型训练</h2>
对alexnt、caffenet、googlenet等网络数据层和输出层进行修改,即可.
<h3>3.1 修改网络数据层</h3>
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mirror: true
crop_size: 224
mean_file: "/path/to/images_mean.binaryproto" # 均值文件路径
#mean_value: 104
#mean_value: 117
#mean_value: 123
}
data_param {
source: "/path/to/train_lmdb" # 训练集的lmdb
batch_size: 32
backend: LMDB
}
}
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
mirror: false
crop_size: 224
mean_file: "/path/to/images_mean.binaryproto" # 均值文件路径
#mean_value: 104
#mean_value: 117
#mean_value: 123
}
data_param {
source: "/path/to/val_lmdb" # 验证集的lmdb
batch_size: 50 # 一般情况下,batch_size * (solver中)test_iter 约等于验证集大小
backend: LMDB
}
}
<h3>3.2 修改网络输出层</h3>
一般情况下,网络只有一个输出,只需要一处即可以,比如 alexnet、caffenet,但 googlenet 有三处需要修改——两个辅助loss层. 这里以 alexnet 为例.
- train_val.prototxt 修改
layer {
name: "fc8" # 网络输出数据层名称, 需修改
type: "InnerProduct"
bottom: "fc7"
top: "fc8" # 网络输出数据层名称, 需修改
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 1000 # 网络输出数,即类别数据, 需修改
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "fc8" # 与网络输出数据层名称一致
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "fc8" # 与网络输出数据层名称一致
bottom: "label"
top: "loss"
}
- deploy.prototxt 修改
layer {
name: "fc8" # 与train_val.prototxt中网络输出数据层名称一致
type: "InnerProduct"
bottom: "fc7"
top: "fc8" # 与train_val.prototxt中网络输出数据层名称一致
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 1000 # 与train_val.prototxt中网络输出类别数一致
}
}
layer {
name: "prob"
type: "Softmax"
bottom: "fc8"
top: "prob"
}
<h3>3.3 模型训练</h3>
- 修改 solver.prototxt
net: "/path/to/alexnet/train_val.prototxt"
test_iter: 1000 # batch_size * test_iter 约等于验证集大小
test_interval: 4000 # 迭代多少次测试一次
test_initialization: false
display: 40
average_loss: 40
base_lr: 0.01
lr_policy: "step"
stepsize: 320000
gamma: 0.96
max_iter: 10000000
momentum: 0.9
weight_decay: 0.0002
snapshot: 40000
snapshot_prefix: "/path/to/alexnet" # 训练的 caffemodel 保存位置
solver_mode: GPU
- 修改并运行 trainnet.sh
#!/usr/bin/env sh
./build/tools/caffe train \
-solver /path/to/alexnet/solver.prototxt \
-gpu 0
运行脚本:
./trainnet.sh
<h2>4. 图片测试</h2>
利用 deploy.prototxt 和训练得到的 caffemodel 进行图片分类测试,其 python 脚本如下:
# coding=utf-8
import numpy as np
import caffe
img = '/path/to/test.jpg'
classes = '/class/names/define' # 分类的类别名
model_def = '/path/to/deploy.prototxt'
weight_def = '/path/to/alexnet_iter_xxx.caffemodel' # 训练得到的 caffemodel
net = caffe.Net(model_def, weight_def, caffe.TEST) # 网络和模型加载,TEST模式
# 图片预处理设置
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
transformer.set_transpose('data', (2,0,1)) # 改变维度的顺序
transformer.set_mean('data', np.load(mean_file).mean(1).mean(1)) # 减均值
transformer.set_raw_scale('data', 255) # 缩放到[0,255]
transformer.set_channel_swap('data', (2,1,0)) #将图片由RGB变为BGR
im = caffe.io.load_image(img) # 测试图片加载
net.blobs['data'].data[...] = transformer.preprocess('data',im) # 测试图片预处理,并载入到blob
out = net.forward()
prob = net.blobs['Softmax1'].data[0].flatten() # 网络输出层(Softmax)值,表示属于某个类别的概率值
idx = prob.argsort()[-1] # 排序概率,取最大值的索引
print 'the class is:', classes[idx] # 将最大值索引转换成对应的类别名称