采用 LMDB 读取图片数据和标签, 处理 multi-label 问题. 基于 ResNet101 backbone 网络 multi-task 输出 类似问题, 图像自动标注
上一篇博文曾尝试过修改 Caffe ImageDataLayer 源码的方式来读取多个 labels - ImageMultilabelDataLayer [Caffe实践 - 多标签图片分类的训练与部署].
修改源码的方式可能显得稍微有点繁琐, 毕竟需要重新编译.
这里尝试了一种新的方式来进行多标签自动标注.
与 [Caffe实践 - 多标签图片分类的训练与部署] 不同的是, 前者是以 Multi-task 的方式进行处理的,每一个 task 分别是一个label的分类问题. 而这里是以多标签标注方式进行的.
<h2>1. 数据集</h2>
<h2>1.1 数据格式转换</h2>
数据形式如, images_labels.txt:
img1.jpg 1 0 1 ... 0
img2.jpg 0 1 0 ... 1
img3.jpg 1 1 0 ... 0
......
每一行是一个数据样本, 其对应的 multilabels 为一个 01 向量的形式. 且所有的样本所对应的 multilabels 向量长度是相同的.
可以类似于 [Caffe实践 - 多标签图片分类的训练与部署] 来读取 multilabels 数据.
但这里采用了另外的一种方式:
首先将数据转换为两个文件, 其内容格式分别为:
- imageslist.txt
img1.jpg 2 img1.jpg 5 img1.jpg 3 ......
每一行对应一个数据样本, 图片 + 标签labels数 的形式.</p></li>
<li>labelslist.txt1 0 1 ... 0 0 1 0 ... 1 1 1 0 ... 0 ......
每一行对应与一个数据样本的 multilabels. 01 向量.</p></li>
</ul><h3>1.2 生成 lmdb 数据</h3>
gen_label_lmdb.py:
#! ---- coding: utf-8 ---- import numpy import argparse from caffe.proto import caffe_pb2 import lmdb def parse_args(): parser = argparse.ArgumentParser(description='End-to-end inference') parser.add_argument('--labels', dest='labels', help='label txt file', default=None, type=str ) parser.add_argument('--images', dest='images', help='image txt file, for keys of datum', default=None, type=str ) parser.add_argument('--lmdb', dest='lmdb', help='label lmdb file', default=None, type=str ) if len(sys.argv) == 1: parser.print_help() sys.exit(1) return parser.parse_args() def array_to_datum(arr, label=0): """ Converts a 3-dimensional array to datum. If the array has dtype uint8, the output data will be encoded as a string. Otherwise, the output data will be stored in float format. """ if arr.ndim != 3: raise ValueError('Incorrect array shape.') datum = caffe_pb2.Datum() datum.channels, datum.height, datum.width = arr.shape if arr.dtype == numpy.uint8: datum.data = arr.tostring() else: datum.float_data.extend(arr.flat) datum.label = label return datum if name == '__main__': print 'Starting Generate lmdb' args = parse_args() labels_array = numpy.loadtxt(args.labels) num_samples, num_classes = labels_array.shape images_keys_list = open(args.images).readlines() assert num_samples == len(images_keys_list) labels_db = lmdb.open(args.lmdb, map_size=20 1024 1024 * 1024) with labels_db.begin(write=True) as txn: for idx in range(0, num_samples): # label data labels_cur = labels_array[idx,:] num_labels = labels_cur.size labels_cur = labels_cur.reshape(num_labels, 1, 1) # keys for label key_cur = images_keys_list[idx].split(' ')[0] # image name key_cur = "{:0>8d}".format(idx) + '_' + key_cur num_taggs = int(images_keys_list[idx].split(' ')[1]) assert num_taggs == labels_cur.sum() # create datum labels_datum = array_to_datum(labels_cur, num_taggs) # write datum to lmdb txn.put(key_cur.encode('ascii'), labels_datum.SerializeToString()) if (idx+1)%5000 == 0: print 'Processed ', idx + 1, 'files in total.' num_label_db = 0 with labels_db.begin() as txn: cursor = txn.cursor() for key, value in cursor: n_label_db = num_label_db + 1 print 'Total # of item in label lmdb:', num_label_db
命令行运行:
python gen_label_lmdb.py --labels ./data/labelslist.txt --images ./data/imageslist.txt --lmdb ./data/output_lmdb/
<h2>2. 模型定义与训练</h2>
<h3>2.1 train_val.prototxt</h3>
name: "resnet101" layer { name: "data" type: "ImageData" top: "data" top: "dummylabel" image_data_param { source: "/path/to/train_imagelist.txt" root_folder: "/path/to/images/" batch_size: 16 new_height: 256 new_width: 256 } transform_param { mirror: true crop_size: 224 mean_value: 104 mean_value: 117 mean_value: 123 } include { phase: TRAIN } } layer { name: "label" type: "Data" top: "label" data_param { source: "/path/to/train_output_lmdb" batch_size: 16 backend: LMDB } include { phase: TRAIN } } ## layer { name: "data" type: "ImageData" top: "data" top: "dummylabel" image_data_param { source: "/path/to/test_imagelist.txt" root_folder: "/path/to/images/" batch_size: 4 new_height: 256 new_width: 256 } transform_param { crop_size: 224 mean_value: 104 mean_value: 117 mean_value: 123 } include { phase: TEST } } layer { name: "label" type: "Data" top: "label" data_param { source: "/path/to/test_output_lmdb" batch_size: 4 backend: LMDB } include { phase: TEST } } ####### dummylabel ####### # 这里 dummpylabe 是未用到的 layer { name: "label_silence" type: "Silence" bottom: "dummylabel" } #### resnet 101 netword define #### ...... ############################# layer { name: "fc_labels" type: "InnerProduct" bottom: "Pooling_" top: "fc_labels" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 1 decay_mult: 0 } inner_product_param { num_output: 30 # labels 数 weight_filler { type: "gaussian" std: 0.005 } bias_filler { type: "constant" value: 0 } } } layer { name: "loss" type: "SigmoidCrossEntropyLoss" # bottom: "fc_labels" bottom: "label" top: "loss" }
<h3>2.2 deploy.prototxt</h3>
name: "resnet101" input: "data" input_shape { dim: 1 dim: 3 dim: 224 dim: 224 } #### netword define #### ...... #################### layer { name: "fc_labels" type: "InnerProduct" bottom: "Pooling_" top: "fc_labels" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 1 decay_mult: 0 } inner_product_param { num_output: 30 # labels 数 weight_filler { type: "gaussian" std: 0.005 } bias_filler { type: "constant" value: 0 } } }
<h2>3. 模型部署</h2>
#!/usr/bin/env python # ---- coding: utf-8 ---- import numpy as np from PIL import Image import scipy.misc import matplotlib.pyplot as plt import sys caffe_root = '/path/to/caffe/' sys.path.insert(0, caffe_root + 'python') import caffe caffe.set_mode_gpu() caffe.set_device(0) # caffe.set_mode_cpu() class SimpleTransformer(object): """ SimpleTransformer is a simple class for preprocessing and deprocessing images for caffe. """ def __init__(self, mean=[128, 128, 128]): self.mean = np.array(mean, dtype=np.float32) self.scale = 1.0 def set_mean(self, mean): """ Set the mean to subtract for centering the data. """ self.mean = mean def set_scale(self, scale): """ Set the data scaling. """ self.scale = scale def preprocess(self, im): """ preprocess() emulate the pre-processing occuring in the vgg16 caffe prototxt. """ im = np.float32(im) im = im[:, :, ::-1] # change to BGR im -= self.mean im *= self.scale im = im.transpose((2, 0, 1)) return im if name == '__main__': print 'Start...' multilabels = ['a', 'b', 'c', ...] test_image = '/home/sh/Pictures/upper/10.jpg' im = np.asarray(Image.open(test_image)) im = scipy.misc.imresize(im, [224, 224]) model_def = '/path/to/deploy.prototxt' weight_def = '/path/to/multilabel_vgg16_iter_100000.caffemodel' net = caffe.Net(model_def, weight_def, caffe.TEST) transformer = ImageTransformer({'data': net.blobs['data'].data.shape}) transformer.set_transpose('data', (2, 0, 1)) # hwc -> chw transformer.set_mean('data', np.array((104, 117, 123))) # mean pixel transformer.set_raw_scale('data', 255) # the net operates on images in [0,255] transformer.set_channel_swap('data', (2, 1, 0)) # RGB -> BGR transformed_image = transformer.preprocess(im) net.blobs['data'].data[...] = transformed_image outputs = net.forward() img_scores = outputs[net.outputs[0]] img_scores = np.array([1./(1+np.exp(-tmp)) for tmp in img_scores[0]]) pred_labels_index = np.where(img_scores > 0.5)[0].tolist() print 'pred_labels:', pred_labels_index, '----', len(pred_labels_index) for tmp in pred_labels_index: print 'pred_labels:', multilabels[tmp], '----', len(pred_labels_index) print '--------------------------------------------' plt.imshow(im) plt.axis('off') plt.show() print 'Done.'
<h2>Related</h2>
[1] - Caffe实践 - 多标签图片分类的训练与部署
4 comments
gen_label_lmdb.py运行后出现 assert num_taggs == labels_cur.sum()
AssertionError
请问怎么解决呢?
应该是标签数量和输入的 txt 数量不一致巴.
您好!请问下imageslist.txt后面的数是啥意思,label数指的是数据集一共有多少个label还是??
是的,是指该图片样本包含的标签数,包含多少个label.