论文: Deep Metric Learning via Lifted Structured Feature Embedding
项目: Caffe - Deep-Metric-Learning-CVPR16/

1. 准备

2. 数据准备

2.1. 数据集格式

Ebay_info.txt

EBay_train.txt

Ebay_test.txt

其内容格式为:

image_id class_id super_class_id path
1 1 1 bicycle_final/111085122871_0.JPG
2 1 1 bicycle_final/111085122871_1.JPG
3 1 1 bicycle_final/111085122871_2.JPG
......
  • image_id - 图片id
  • class_id - 商品的款数id
  • super_class_id - 图片的类别id
  • path - 图片路径

注: Ebay_train.txt 和 Ebay_test.txt 的 image_id 和 class_id 是依次排列的, Ebay_test.txt 的第一个图片样本是接着 Ebay_train.txt的最后一个图片样本的.

conf.m 配置:

function conf = config()
    conf = struct();

    %% Directories
    conf.root_path = '/path/to/Deep-Metric-Learning-CVPR16/code/ebay/';
    conf.cache_path = '/path/to/Deep-Metric-Learning-CVPR16/code/ebay/cache';
    conf.image_path = '/path/to/Ebay_Dataset/';

    %% Training parameters
    conf.preprocessing.crop_padding = 15;
    conf.preprocessing.square_size = 256;
    conf.preprocessing.num_to_load = 255;
    conf.preprocessed_image_file = [conf.cache_path, '/training_images.mat'];

    path_triplet = '/path/to/Deep-Metric-Learning-CVPR16/code/ebay/cache';

    % for multilabel pairs batchsize = 64
    conf.training_set_path_multilabel_m64 = [path_triplet, '/training_set_ebay_multilabel_m64.lmdb'];

    % for multilabel pairs batchsize = 128
    conf.training_set_path_multilabel_m128 = [path_triplet, '/training_set_ebay_multilabel_m128.lmdb'];

    % for multilabel pairs batchsize = 128*2 = 256
    conf.training_set_path_multilabel_m256 = [path_triplet, '/training_set_ebay_multilabel_m256.lmdb'];

    % for multilabel pairs batchsize = 128*3 = 384
    conf.training_set_path_multilabel_m384 = [path_triplet, '/training_set_ebay_multilabel_m384.lmdb'];

    % for debuggin,
    conf.training_imageset_path = [path_triplet, '/training_imageset_ebay.lmdb'];

    conf.training_set_path_triplet = [path_triplet, '/training_set_triplet.lmdb'];
    conf.validation_set_path_triplet = [path_triplet, '/validation_set_triplet.lmdb'];
    conf.validation_set_path = [path_triplet, '/validation_set_cars196.lmdb'];

2.2. 数据集分割

利用 code/gen_splits.m 来创建 mat 格式的 train/test 数据集分割,主要修改的地方如下:

function gen_splits
    conf = config;
    
    %% generate splits
    [image_ids, class_ids, superclass_ids, path_list] = ...
        textread('/path/to/Ebay_train.txt', '%d %d %d %s',...
        'headerlines', 1); % train.txt 路径

    [image_ids, class_ids, superclass_ids, path_list] = ...
        textread('/path/to/Ebay_test.txt', '%d %d %d %s',...
        'headerlines', 1); % test.txt 路径

    savepath = [conf.root_path, 'splits.mat'];
    save(savepath, 'train_images', 'val_images', 'dict');

2.3. 数据集图片裁剪

利用 code/gen_images.m 来对数据集中的每张图片进行裁剪到 (256, 256) 的尺寸, 并保存为 mat 格式. 这里根据图片数量,可能导致mat文件很大,这里6.5G左右.

涉及的函数有:

  1. check_images(conf.image_path, train_images); % 检查图片格式是否正确
  2. load_cropped_images(conf.image_path, images, conf.preprocessing.crop_padding, conf.preprocessing.square_size); % 对图片进行裁剪,输出结构体格式的 images.

2.4 编译cpp

这里对训练数据集和测试数据集创建 DB格式时,需用 matlab 来调用两个cpp函数—— imageset_to_leveldb.cppserialized_pairs_to_leveldb.cpp, 即运行 code/compile.m .

mex imageset_to_leveldb.cpp -lcaffe -lglog -lgflags -lprotobuf -lleveldb -llmdb ...
     -L/path/to/caffe/build/lib ...
     -I/path/to/caffe/build/src/caffe/proto
    
mex serialized_pairs_to_leveldb.cpp -lcaffe -lglog -lgflags -lprotobuf -lleveldb -llmdb ...
     -L/path/to/caffe/build/lib ...
     -I/path/to/caffe/build/src/caffe/proto

注;可能出现 libcaffe.so.1.0.0-rc3 not found 问题,解决方案是在matlab安装路径创建 caffe 软连接:

sudo ln -s /path/to/caffe/build/lib/libcaffe.so.1.0.0-rc3 /path/to/matlab/bin/glnxa64/

3. 检索测试

这里只用于商品图像检索,故只利用 code/evaluation/evaluate_recall.m 来评价呢 recall@K.

3.1. 特征提取

论文分别提供了基于 Cars196、CUB200 和 OnlineProduct三种数据集训练好的模型,这里采用的是OnlineProduct数据集,利用预训练模型进行特征提取.

3.1.1. 创建Leveldb测试数据

首先需要采用 code/gen_caffe_validation_imageset.m 创建Leveldb测试图像数据集,其中涉及的cpp函数 imageset_to_leveldb 中将测试图像的 label 设为 0. 这里共有60502张测试图片.

3.1.2. 修改 prototxt 中数据集路径

修改model/extract_googlent*.prototxt 中测试数据集路径. 这里提供了提取 [64, 128, 256, 512] 维特征的prototxt网络结构. 其中,特征提取维度的不同,仅在 fc_embedding 层的 num_output 处.

layer {
      name: "data"
      type: "Data"
      top: "data"
      top: "label"
      include {
        phase: TEST
      }
      transform_param {
        crop_size: 227
        mean_file: "/path/to/ebay/cache/imagenet_mean.binaryproto"
      }
      data_param {
        source: "/path/to/ebay/cache/validation_set_ebay.lmdb"
        batch_size: 26
        backend: LMDB
      }
    }

3.1.3. 特征提取

此处采用 code/compute_googlenet_distance_matrix_cuda_embeddings_liftedstructsim_softmax_pair_m128.py 根据模型进行特征提取,并将特征矩阵保存为 mat 格式.

#!/usr/bin/python
# coding: utf-8
    
import numpy as np
import matplotlib.pyplot as plt
import caffe
import scipy.io as io
import sys
    
embedding_dimension = 64 # 提取特征维度
print 'Embedding dim: %d' % embedding_dimension
    
MODEL_FILE = 'model/extract_googlenet_ebay_feature_embed%d.prototxt' % (embedding_dimension)
    
PRETRAINED = 'snapshot_ebay_googlenet_finetune_liftedstructsim_softmax_pair_m128_multilabel_embed%d_baselr_1E4_iter_20000.caffemodel' % embedding_dimension
    
MEAN_FILE = 'cache/imagenet_mean.binaryproto'
    
caffe.set_device(0)
caffe.set_mode_gpu()
    
net = caffe.Net(MODEL_FILE, PRETRAINED, caffe.TEST)
    
import lmdb
LMDB_FILENAME = '/path/to/ebay/cache/validation_set_ebay.lmdb'
lmdb_env = lmdb.open(LMDB_FILENAME)
lmdb_txn = lmdb_env.begin()
lmdb_cursor = lmdb_txn.cursor()
    
num_imgs = 0
for key in lmdb_cursor:
    num_imgs += 1
print 'Num images: %d' %num_imgs
    
batchsize = 26
num_batches = num_imgs / batchsize 
print 'Num batches: %d' %num_batches
    
# Store fc features for all images
fc_feat_dim = embedding_dimension
    
feat_matrix = np.zeros((num_imgs, fc_feat_dim), dtype=np.float32)
filename_list = []
    
filename_idx = 0
for batch_id in range(num_batches):
    batch = net.forward()
    fc = net.blobs['fc_embedding'].data.copy()
    for i in range(batchsize):
        feat_matrix[filename_idx+i, :] = fc[i,:]
    
    filename_idx += batchsize
    
a = {}
a['fc_embedding'] = feat_matrix
   io.savemat('features/validation_googlenet_feat_matrix_liftedstructsim_softmax_pair_m128_multilabel_embed%d_baselr_1E4_gaussian2k.mat' % embedding_dimension, a)
    
print "all zeros?: ", np.allclose(feat_matrix, 0)

3.2. 图像检索

evaluate_recall.m

function evaluate_recall(embedding_dimension)
    
    fprintf('embedding_dimension %d\n', embedding_dimension); % 这里 embedding_dimension=64
    
    name = 'liftedstructsim_softmax_pair_m128_multilabel';
    feature_filename = ...
        sprintf('/path/to/features/validation_googlenet_feat_matrix_%s_embed%d_baselr_1E4_gaussian2k.mat', ...
        name, embedding_dimension);
    
    try
        load(feature_filename, 'fc_embedding');
    catch
        error('filename is non existent.\n');
    end
    
    features = fc_embedding;
    
    dims = size(features);
    assert(dims(2) == embedding_dimension);
    assert(dims(1) == 60502);
    
    %D = squareform(pdist(features, 'euclidean'));
    D2 = distance_matrix(features); % 此处比较耗费内存和时间,本机32G内存加 swap 32G 仅能针对 64 维特征进行距离矩阵计算.
    
    [image_ids, class_ids, superclass_ids, path_list] = ...
        textread('/path/to/Ebay_test.txt', '%d %d %d %s',...
        'headerlines', 1); 
    
    % set diagonal to very high number
    assert(dims(1) == length(class_ids));
    num = dims(1);
    D = sqrt(abs(D2));
    D(1:num+1:num*num) = inf;
    
    for K = [1, 10, 100, 1000]
        compute_recall_at_K(D, K, class_ids, num);
    end
    
    disp('done');
    
    % compute pairwise distance matrix
    function D = distance_matrix(X)
    
    m = size(X, 1);
    t = ones(m, 1);
    x = zeros(m, 1);
    for i = 1:m
        n = norm(X(i,:));
        x(i) = n * n;
    end
    
    D = x * t' + t * x' - 2 * X * X'; %%%%%%
    
    % compute recall@K
    function recall = compute_recall_at_K(D, K, class_ids, num)
    
    num_correct = 0;
    for i = 1 : num
        this_gt_class_idx = class_ids(i);
        this_row = D(i,:);
        [~, inds] = sort(this_row, 'ascend');
        knn_inds = inds(1:K);
    
        knn_class_inds = class_ids(knn_inds);
    
        if sum(ismember(knn_class_inds, this_gt_class_idx)) > 0
            num_correct = num_correct + 1;
        end
    end
    recall = num_correct / num;
    fprintf('K: %d, Recall: %.3f\n', K, recall);

基于64维特征的 recall@K 结果为:

K=1, 0.556
K =10, 0.756
K=100, 0.895
K=1000, 0.971

4. 模型训练

[1] - 创建 Leveldb 训练数据集

采用 code/gen_caffe_dataset_multilabel_m128.m 来创建训练数据集,这里涉及训练数据集图片及label生成,并保存为 DB 格式. 创建的数据集较大,59551张图片,batchsize=64, 生成的 multiclass label 最终的lmdb将近 200G.</p>

[2] - 修改 model/train_val_googlenet_finetune_liftedstructsim_softmax_pair_m128_multilabel_embed64.prototxt 和 model/solver_googlenet_finetune_liftedstructsim_softmax_pair_m128_multilabel_embed64_baselr_1E4.prototxt. 这里需要提供 ImageNet 的均值文件 imagenet_mean.binaryproto.

[3] - 模型训练

/path/to/caffe/build/tools/caffe train \
      --solver=model/solver_googlenet_finetune_liftedstructsim_softmax_pair_m128_multilabel_embed64_baselr_1E4.prototxt.prototxt \
      --weights=model/bvlc_googlenet.caffemodel \
      --gpu 0 2 > 1 | tee model/caffemodel/ebay_googlenet_m64.log

训练的过程中,采用提供的该方法需要创建较大的 DB 格式数据集,如果数据集更大的化,甚至难以进行. 实际上可以不用创建DB格式数据集,直接利用 gen_caffe_dataset_multilabel_m128.m 中创建的 multiclas label 直接从图像进行数据读取. 需要注意的是,multiclass label 生成时的 batchsize 与train_val_googlenet_*.prototxt 中的batchsize 应该相同,能够得到同样的图像检索效果.

在提供的源码中,新增的 caffe 层是——lifted_struct_similarity_softmax_layer,需要batchsize的来对 batch 内的数据进行网络训练.

Last modification:February 4th, 2021 at 02:06 pm