论文: Deep Metric Learning via Lifted Structured Feature Embedding
项目: Caffe - Deep-Metric-Learning-CVPR16/
1. 准备
- 编译安装 Caffe-Deep-Metric-Learning-CVPR16
- 基于ImageNet训练的的 GoogleNet
- Stanford Online Products dataset(2.9G),这里采用的实验数据集
- Cars Dataset
- Pre-trained Models(265M)
2. 数据准备
2.1. 数据集格式
Ebay_info.txt
EBay_train.txt
Ebay_test.txt
其内容格式为:
image_id class_id super_class_id path
1 1 1 bicycle_final/111085122871_0.JPG
2 1 1 bicycle_final/111085122871_1.JPG
3 1 1 bicycle_final/111085122871_2.JPG
......
- image_id - 图片id
- class_id - 商品的款数id
- super_class_id - 图片的类别id
- path - 图片路径
注: Ebay_train.txt 和 Ebay_test.txt 的 image_id 和 class_id 是依次排列的, Ebay_test.txt 的第一个图片样本是接着 Ebay_train.txt的最后一个图片样本的.
conf.m 配置:
function conf = config()
conf = struct();
%% Directories
conf.root_path = '/path/to/Deep-Metric-Learning-CVPR16/code/ebay/';
conf.cache_path = '/path/to/Deep-Metric-Learning-CVPR16/code/ebay/cache';
conf.image_path = '/path/to/Ebay_Dataset/';
%% Training parameters
conf.preprocessing.crop_padding = 15;
conf.preprocessing.square_size = 256;
conf.preprocessing.num_to_load = 255;
conf.preprocessed_image_file = [conf.cache_path, '/training_images.mat'];
path_triplet = '/path/to/Deep-Metric-Learning-CVPR16/code/ebay/cache';
% for multilabel pairs batchsize = 64
conf.training_set_path_multilabel_m64 = [path_triplet, '/training_set_ebay_multilabel_m64.lmdb'];
% for multilabel pairs batchsize = 128
conf.training_set_path_multilabel_m128 = [path_triplet, '/training_set_ebay_multilabel_m128.lmdb'];
% for multilabel pairs batchsize = 128*2 = 256
conf.training_set_path_multilabel_m256 = [path_triplet, '/training_set_ebay_multilabel_m256.lmdb'];
% for multilabel pairs batchsize = 128*3 = 384
conf.training_set_path_multilabel_m384 = [path_triplet, '/training_set_ebay_multilabel_m384.lmdb'];
% for debuggin,
conf.training_imageset_path = [path_triplet, '/training_imageset_ebay.lmdb'];
conf.training_set_path_triplet = [path_triplet, '/training_set_triplet.lmdb'];
conf.validation_set_path_triplet = [path_triplet, '/validation_set_triplet.lmdb'];
conf.validation_set_path = [path_triplet, '/validation_set_cars196.lmdb'];
2.2. 数据集分割
利用 code/gen_splits.m 来创建 mat 格式的 train/test 数据集分割,主要修改的地方如下:
function gen_splits
conf = config;
%% generate splits
[image_ids, class_ids, superclass_ids, path_list] = ...
textread('/path/to/Ebay_train.txt', '%d %d %d %s',...
'headerlines', 1); % train.txt 路径
[image_ids, class_ids, superclass_ids, path_list] = ...
textread('/path/to/Ebay_test.txt', '%d %d %d %s',...
'headerlines', 1); % test.txt 路径
savepath = [conf.root_path, 'splits.mat'];
save(savepath, 'train_images', 'val_images', 'dict');
2.3. 数据集图片裁剪
利用 code/gen_images.m 来对数据集中的每张图片进行裁剪到 (256, 256) 的尺寸, 并保存为 mat 格式. 这里根据图片数量,可能导致mat文件很大,这里6.5G左右.
涉及的函数有:
- check_images(conf.image_path, train_images); % 检查图片格式是否正确
- load_cropped_images(conf.image_path, images, conf.preprocessing.crop_padding, conf.preprocessing.square_size); % 对图片进行裁剪,输出结构体格式的 images.
2.4 编译cpp
这里对训练数据集和测试数据集创建 DB格式时,需用 matlab 来调用两个cpp函数—— imageset_to_leveldb.cpp和serialized_pairs_to_leveldb.cpp, 即运行 code/compile.m .
mex imageset_to_leveldb.cpp -lcaffe -lglog -lgflags -lprotobuf -lleveldb -llmdb ...
-L/path/to/caffe/build/lib ...
-I/path/to/caffe/build/src/caffe/proto
mex serialized_pairs_to_leveldb.cpp -lcaffe -lglog -lgflags -lprotobuf -lleveldb -llmdb ...
-L/path/to/caffe/build/lib ...
-I/path/to/caffe/build/src/caffe/proto
注;可能出现 libcaffe.so.1.0.0-rc3 not found 问题,解决方案是在matlab安装路径创建 caffe 软连接:
sudo ln -s /path/to/caffe/build/lib/libcaffe.so.1.0.0-rc3 /path/to/matlab/bin/glnxa64/
3. 检索测试
这里只用于商品图像检索,故只利用 code/evaluation/evaluate_recall.m 来评价呢 recall@K.
3.1. 特征提取
论文分别提供了基于 Cars196、CUB200 和 OnlineProduct三种数据集训练好的模型,这里采用的是OnlineProduct数据集,利用预训练模型进行特征提取.
3.1.1. 创建Leveldb测试数据
首先需要采用 code/gen_caffe_validation_imageset.m 创建Leveldb测试图像数据集,其中涉及的cpp函数 imageset_to_leveldb 中将测试图像的 label 设为 0. 这里共有60502张测试图片.
3.1.2. 修改 prototxt 中数据集路径
修改model/extract_googlent*.prototxt 中测试数据集路径. 这里提供了提取 [64, 128, 256, 512] 维特征的prototxt网络结构. 其中,特征提取维度的不同,仅在 fc_embedding 层的 num_output 处.
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
crop_size: 227
mean_file: "/path/to/ebay/cache/imagenet_mean.binaryproto"
}
data_param {
source: "/path/to/ebay/cache/validation_set_ebay.lmdb"
batch_size: 26
backend: LMDB
}
}
3.1.3. 特征提取
此处采用 code/compute_googlenet_distance_matrix_cuda_embeddings_liftedstructsim_softmax_pair_m128.py 根据模型进行特征提取,并将特征矩阵保存为 mat 格式.
#!/usr/bin/python
# coding: utf-8
import numpy as np
import matplotlib.pyplot as plt
import caffe
import scipy.io as io
import sys
embedding_dimension = 64 # 提取特征维度
print 'Embedding dim: %d' % embedding_dimension
MODEL_FILE = 'model/extract_googlenet_ebay_feature_embed%d.prototxt' % (embedding_dimension)
PRETRAINED = 'snapshot_ebay_googlenet_finetune_liftedstructsim_softmax_pair_m128_multilabel_embed%d_baselr_1E4_iter_20000.caffemodel' % embedding_dimension
MEAN_FILE = 'cache/imagenet_mean.binaryproto'
caffe.set_device(0)
caffe.set_mode_gpu()
net = caffe.Net(MODEL_FILE, PRETRAINED, caffe.TEST)
import lmdb
LMDB_FILENAME = '/path/to/ebay/cache/validation_set_ebay.lmdb'
lmdb_env = lmdb.open(LMDB_FILENAME)
lmdb_txn = lmdb_env.begin()
lmdb_cursor = lmdb_txn.cursor()
num_imgs = 0
for key in lmdb_cursor:
num_imgs += 1
print 'Num images: %d' %num_imgs
batchsize = 26
num_batches = num_imgs / batchsize
print 'Num batches: %d' %num_batches
# Store fc features for all images
fc_feat_dim = embedding_dimension
feat_matrix = np.zeros((num_imgs, fc_feat_dim), dtype=np.float32)
filename_list = []
filename_idx = 0
for batch_id in range(num_batches):
batch = net.forward()
fc = net.blobs['fc_embedding'].data.copy()
for i in range(batchsize):
feat_matrix[filename_idx+i, :] = fc[i,:]
filename_idx += batchsize
a = {}
a['fc_embedding'] = feat_matrix
io.savemat('features/validation_googlenet_feat_matrix_liftedstructsim_softmax_pair_m128_multilabel_embed%d_baselr_1E4_gaussian2k.mat' % embedding_dimension, a)
print "all zeros?: ", np.allclose(feat_matrix, 0)
3.2. 图像检索
evaluate_recall.m
function evaluate_recall(embedding_dimension)
fprintf('embedding_dimension %d\n', embedding_dimension); % 这里 embedding_dimension=64
name = 'liftedstructsim_softmax_pair_m128_multilabel';
feature_filename = ...
sprintf('/path/to/features/validation_googlenet_feat_matrix_%s_embed%d_baselr_1E4_gaussian2k.mat', ...
name, embedding_dimension);
try
load(feature_filename, 'fc_embedding');
catch
error('filename is non existent.\n');
end
features = fc_embedding;
dims = size(features);
assert(dims(2) == embedding_dimension);
assert(dims(1) == 60502);
%D = squareform(pdist(features, 'euclidean'));
D2 = distance_matrix(features); % 此处比较耗费内存和时间,本机32G内存加 swap 32G 仅能针对 64 维特征进行距离矩阵计算.
[image_ids, class_ids, superclass_ids, path_list] = ...
textread('/path/to/Ebay_test.txt', '%d %d %d %s',...
'headerlines', 1);
% set diagonal to very high number
assert(dims(1) == length(class_ids));
num = dims(1);
D = sqrt(abs(D2));
D(1:num+1:num*num) = inf;
for K = [1, 10, 100, 1000]
compute_recall_at_K(D, K, class_ids, num);
end
disp('done');
% compute pairwise distance matrix
function D = distance_matrix(X)
m = size(X, 1);
t = ones(m, 1);
x = zeros(m, 1);
for i = 1:m
n = norm(X(i,:));
x(i) = n * n;
end
D = x * t' + t * x' - 2 * X * X'; %%%%%%
% compute recall@K
function recall = compute_recall_at_K(D, K, class_ids, num)
num_correct = 0;
for i = 1 : num
this_gt_class_idx = class_ids(i);
this_row = D(i,:);
[~, inds] = sort(this_row, 'ascend');
knn_inds = inds(1:K);
knn_class_inds = class_ids(knn_inds);
if sum(ismember(knn_class_inds, this_gt_class_idx)) > 0
num_correct = num_correct + 1;
end
end
recall = num_correct / num;
fprintf('K: %d, Recall: %.3f\n', K, recall);
基于64维特征的 recall@K 结果为:
K=1, 0.556
K =10, 0.756
K=100, 0.895
K=1000, 0.971
4. 模型训练
[1] - 创建 Leveldb 训练数据集
采用 code/gen_caffe_dataset_multilabel_m128.m 来创建训练数据集,这里涉及训练数据集图片及label生成,并保存为 DB 格式. 创建的数据集较大,59551张图片,batchsize=64, 生成的 multiclass label 最终的lmdb将近 200G.</p>
[2] - 修改 model/train_val_googlenet_finetune_liftedstructsim_softmax_pair_m128_multilabel_embed64.prototxt 和 model/solver_googlenet_finetune_liftedstructsim_softmax_pair_m128_multilabel_embed64_baselr_1E4.prototxt. 这里需要提供 ImageNet 的均值文件 imagenet_mean.binaryproto.
[3] - 模型训练
/path/to/caffe/build/tools/caffe train \
--solver=model/solver_googlenet_finetune_liftedstructsim_softmax_pair_m128_multilabel_embed64_baselr_1E4.prototxt.prototxt \
--weights=model/bvlc_googlenet.caffemodel \
--gpu 0 2 > 1 | tee model/caffemodel/ebay_googlenet_m64.log
训练的过程中,采用提供的该方法需要创建较大的 DB 格式数据集,如果数据集更大的化,甚至难以进行. 实际上可以不用创建DB格式数据集,直接利用 gen_caffe_dataset_multilabel_m128.m 中创建的 multiclas label 直接从图像进行数据读取. 需要注意的是,multiclass label 生成时的 batchsize 与train_val_googlenet_*.prototxt 中的batchsize 应该相同,能够得到同样的图像检索效果.
在提供的源码中,新增的 caffe 层是——lifted_struct_similarity_softmax_layer,需要batchsize的来对 batch 内的数据进行网络训练.