detectron2 自定义数据集的训练.

Github 项目 - detectron2 安装与简单使用 - AIUAI

以气球分割数据集(ballon segmentation dataset) 为例,介绍 detectron2 模型在定制数据集上的训练. 该数据集仅有一个类别:气球.

1. 数据准备

[1] - 下载数据集:

wget https://github.com/matterport/Mask_RCNN/releases/download/v2.1/balloon_dataset.zip unzip balloon_dataset.zip

2. 数据集注册

根据 detectron2 custom dataset tutorial 中的说明,需要先注册 ballon 数据集到 detectron2.

主要包括两步:

[1] - 注册数据集(Register datasets),告诉 detectron2 如何获得数据集.

[2] - 注册数据集的元数据(Register metadata,可选).

2.1. COCO 格式数据

detectron2.data.datasets.load_coco_json(json_file, image_root, dataset_name=None, extra_annotation_keys=None)

from detectron2.data.datasets import register_coco_instances register_coco_instances("my_dataset_train", {}, "json_annotation_train.json", "path/to/image/dir") register_coco_instances("my_dataset_val", {}, "json_annotation_val.json", "path/to/image/dir")

2.2. 非 COCO 格式数据

2.2.1. Register datasets

注册数据集,需要定义一个将数据集加载进 detectron2 标准格式的函数,并将该函数注册到 detectron2 DatasetCatalog. 即:

def my_dataset_function(): ''' something ''' return list[dict] from detectron2.data import DatasetCatalog DatasetCatalog.register("my_dataset", my_dataset_function)

其中,dict 即为一张图像及对应的标注,根据 Standard Dataset Dicts 中的说明,主要包含如下字段:

图片信息字段:

  • file_name:图片所在的绝对路径
  • height, width:图片的长和宽
  • image_id (str or int):图片唯一标识id
  • annotations (list[dict]):每个dict就是一个instance的标注信息,如box和mask

instance 信息字段:

  • bbox (list[float]):物体标注框,4个浮点数,xyxy 或 xywh次序
  • bbox_mode (int):bbox的格式,BoxMode.XYXY_ABS或者BoxMode.XYWH_ABS
  • category_id (int): 类别标签, [0, num_categories-1]范围内的整数,detectron2会把num_categories设置为背景类(background).
  • segmentation (list[list[float]] or dict): instance 分割的标注信息.

[1] - 如果为 list[list[float]],则表示 polygons 列表,其中每个多边形表示一个目标物体的连通区域(connected commonent). 每个 list[float] 是一个 [x1, y1, ..., xn, yn] 格式的多边形. xs 和 ys 表示以像素为单位的绝对坐标.

[2] - 如果为 dict,则表示以 COCO compressed RLE 格式的朱像素的分割mask(per-pixles segmentation mask). 该 dict 需要有两个字段 sizecounts. 采用 pycocotools.mask.encode(np.asarray(mask, order="F")) 可以将值为 0 和1 的 uint8 分割 mask 转换为该 dict 格式. 此时,如果采用默认的 dataloader,cfg.INPUT.MASK_FORMAT 必须设为 bitmask.

  • iscrowd 0 (default) or 1. instance 标注是否为 COCO 的 crowd region. 如果不知道其含义,不建议使用该字段.

2.2.2. Register metadata

每个数据集都有对应的 metadata,通过 MetadataCatalog.get(dataset_name).some_metadata 关联起来.

Metadata 是 key-value 映射,其包含了整个数据集的共享信息,往往用于解释数据集,比如,类目数量、类别颜色、文件根目录 等等. 其有助于数据增强、评估、可视化、日志等.

如果通过 DatasetCatalog.register 注册的数据集,则也可以通过 MetadataCatalog.get(dataset_name).some_key = some_value 来新增对应的 metadata.

例如:

from detectron2.data import MetadataCatalog MetadataCatalog.get("my_dataset").thing_classes = ["person", "dog"]

detectron2 的 builtin 中包含的 metadata 字段有:

  • thing_classes (list[str]): 用于 instance detection 和 instance segmentation 任务. 每个 instance/thing 类目的类别名列表.
  • thing_colors (list[tuple(r, g, b)]): 预定义的每个 thing 类别的颜色,用于可视化. 如果未指定,则采用随机颜色.
  • stuff_classes (list[str]): 用于 semantic segmentation 和 panoptic segmentation 任务. 每个 stuff 类目的类别名列表.
  • stuff_colors (list[tuple(r, g, b)]): 预定义的每个 stuff 类别的颜色,用于可视化. 如果未指定,则采用随机颜色.

更多可参考 “Metadata” for Datasets.

2.2.3. 实例 - Ballon 数据集

比如 Ballon 数据集的原格式,

[1] - 首先,需要定义一个将数据集加载进 detectron2 标准格式的函数.

import os import numpy as np import json from detectron2.structures import BoxMode def get_balloon_dicts(img_dir): json_file = os.path.join(img_dir, "via_region_data.json") with open(json_file) as f: imgs_anns = json.load(f) dataset_dicts = [] for _, v in imgs_anns.items(): record = {} filename = os.path.join(img_dir, v["filename"]) height, width = cv2.imread(filename).shape[:2] record["file_name"] = filename record["height"] = height record["width"] = width annos = v["regions"] objs = [] for _, anno in annos.items(): assert not anno["region_attributes"] anno = anno["shape_attributes"] px = anno["all_points_x"] py = anno["all_points_y"] poly = [(x + 0.5, y + 0.5) for x, y in zip(px, py)] poly = [p for x in poly for p in x] obj = { "bbox": [np.min(px), np.min(py), np.max(px), np.max(py)], "bbox_mode": BoxMode.XYXY_ABS, "segmentation": [poly], "category_id": 0, "iscrowd": 0 } objs.append(obj) record["annotations"] = objs dataset_dicts.append(record) return dataset_dicts

[2] - 注册数据集元数据

from detectron2.data import DatasetCatalog, MetadataCatalog for d in ["train", "val"]: DatasetCatalog.register("balloon/" + d, lambda d=d: get_balloon_dicts("balloon/" + d)) MetadataCatalog.get("balloon/" + d).set(thing_classes=["balloon"]) # balloon_metadata = MetadataCatalog.get("balloon/train")

[3] - 验证数据加载是否正确 - 随机可视化训练数据集的图片标注:

import random dataset_dicts = get_balloon_dicts("balloon/train") for d in random.sample(dataset_dicts, 3): img = cv2.imread(d["file_name"]) visualizer = Visualizer(img[:, :, ::-1], metadata=balloon_metadata, scale=0.5) vis = visualizer.draw_dataset_dict(d) plt.imshow(vis.get_image()[:, :, ::-1]) plt.show()

如图:

3. 数据集使用

当数据集注册后,即可使用 cfg.DATASETS.{TRAIN,TEST} 中的数据集名,比如 my_dataset.

此外,可能还需要修改的地方有:

  • MODEL.ROI_HEADS.NUM_CLASSESMODEL.RETINANET.NUM_CLASSES, 分别为 R-CNN 和 RetinaNet 模型的 thing 类目的数量.
  • MODEL.ROI_KEYPOINT_HEAD.NUM_KEYPOINTS 为 Keypoint R-CNN 的 keypoints 数量.
  • MODEL.SEM_SEG_HEAD.NUM_CLASSES 为 Semantic FPN 和 Panoptic FPN 的 stuff 类目的数量.

更多可参考 Update the Config for New Datasets.

采用 detectron2 的默认 dataloader 来进行数据加载.

detectron2.data 模块中包含了build_detection_train_loaderbuild_detection_test_loader两个方法来分别加载训练和测试数据集.

如,训练数据的 dataloader:

from detectron2.engine import DefaultTrainer from detectron2.config import get_cfg from detectron2.data import build_detection_train_loader cfg = get_cfg() # 这里用COCO数据集的mask rcnn R50_FPN模型参数 cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")) cfg.DATASETS.TRAIN = ("balloon_train",) train_loader = build_detection_train_loader(cfg, mapper=None)

Use Custom Dataloaders

4. 模型训练

采用 COCO 预训练的 R50-FPN Mask R-CNN 模型进行微调. 在 Colab K80 GPU 上大约 6 分钟训练 300 次迭代.

from detectron2.engine import DefaultTrainer from detectron2.config import get_cfg cfg = get_cfg() cfg.merge_from_file("./detectron2_repo/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml") cfg.DATASETS.TRAIN = ("balloon/train",) cfg.DATASETS.TEST = () # 没有测试集 cfg.DATALOADER.NUM_WORKERS = 2 cfg.MODEL.WEIGHTS = "detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl" # 模型微调 cfg.SOLVER.IMS_PER_BATCH = 2 cfg.SOLVER.BASE_LR = 0.00025 cfg.SOLVER.MAX_ITER = 300 cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128 # faster, and good enough for this toy dataset cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1 #仅有一类(ballon) os.makedirs(cfg.OUTPUT_DIR, exist_ok=True) trainer = DefaultTrainer(cfg) trainer.resume_or_load(resume=False) trainer.train()

输出信息如:

WARNING [10/15 06:15:59 d2.config.compat]: Config './detectron2_repo/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml' has no VERSION. Assuming it to be compatible with latest v2. [10/15 06:16:02 d2.engine.defaults]: Model: GeneralizedRCNN( (backbone): FPN( (fpn_lateral2): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1)) (fpn_output2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (fpn_lateral3): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1)) (fpn_output3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (fpn_lateral4): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1)) (fpn_output4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (fpn_lateral5): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1)) (fpn_output5): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (top_block): LastLevelMaxPool() (bottom_up): ResNet( (stem): BasicStem( (conv1): Conv2d( 3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) ) (res2): Sequential( (0): BottleneckBlock( (shortcut): Conv2d( 64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv1): Conv2d( 64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv2): Conv2d( 64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv3): Conv2d( 64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) ) (1): BottleneckBlock( (conv1): Conv2d( 256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv2): Conv2d( 64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv3): Conv2d( 64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) ) (2): BottleneckBlock( (conv1): Conv2d( 256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv2): Conv2d( 64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv3): Conv2d( 64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) ) ) (res3): Sequential( (0): BottleneckBlock( (shortcut): Conv2d( 256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv1): Conv2d( 256, 128, kernel_size=(1, 1), stride=(2, 2), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv2): Conv2d( 128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv3): Conv2d( 128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) ) (1): BottleneckBlock( (conv1): Conv2d( 512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv2): Conv2d( 128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv3): Conv2d( 128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) ) (2): BottleneckBlock( (conv1): Conv2d( 512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv2): Conv2d( 128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv3): Conv2d( 128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) ) (3): BottleneckBlock( (conv1): Conv2d( 512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv2): Conv2d( 128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv3): Conv2d( 128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) ) ) (res4): Sequential( (0): BottleneckBlock( (shortcut): Conv2d( 512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) (conv1): Conv2d( 512, 256, kernel_size=(1, 1), stride=(2, 2), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) (1): BottleneckBlock( (conv1): Conv2d( 1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) (2): BottleneckBlock( (conv1): Conv2d( 1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) (3): BottleneckBlock( (conv1): Conv2d( 1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) (4): BottleneckBlock( (conv1): Conv2d( 1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) (5): BottleneckBlock( (conv1): Conv2d( 1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) ) (res5): Sequential( (0): BottleneckBlock( (shortcut): Conv2d( 1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05) ) (conv1): Conv2d( 1024, 512, kernel_size=(1, 1), stride=(2, 2), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv2): Conv2d( 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv3): Conv2d( 512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05) ) ) (1): BottleneckBlock( (conv1): Conv2d( 2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv2): Conv2d( 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv3): Conv2d( 512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05) ) ) (2): BottleneckBlock( (conv1): Conv2d( 2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv2): Conv2d( 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv3): Conv2d( 512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05) ) ) ) ) ) (proposal_generator): RPN( (anchor_generator): DefaultAnchorGenerator( (cell_anchors): BufferList() ) (rpn_head): StandardRPNHead( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (objectness_logits): Conv2d(256, 3, kernel_size=(1, 1), stride=(1, 1)) (anchor_deltas): Conv2d(256, 12, kernel_size=(1, 1), stride=(1, 1)) ) ) (roi_heads): StandardROIHeads( (box_pooler): ROIPooler( (level_poolers): ModuleList( (0): ROIAlign(output_size=(7, 7), spatial_scale=0.25, sampling_ratio=0, aligned=True) (1): ROIAlign(output_size=(7, 7), spatial_scale=0.125, sampling_ratio=0, aligned=True) (2): ROIAlign(output_size=(7, 7), spatial_scale=0.0625, sampling_ratio=0, aligned=True) (3): ROIAlign(output_size=(7, 7), spatial_scale=0.03125, sampling_ratio=0, aligned=True) ) ) (box_head): FastRCNNConvFCHead( (fc1): Linear(in_features=12544, out_features=1024, bias=True) (fc2): Linear(in_features=1024, out_features=1024, bias=True) ) (box_predictor): FastRCNNOutputLayers( (cls_score): Linear(in_features=1024, out_features=2, bias=True) (bbox_pred): Linear(in_features=1024, out_features=4, bias=True) ) (mask_pooler): ROIPooler( (level_poolers): ModuleList( (0): ROIAlign(output_size=(14, 14), spatial_scale=0.25, sampling_ratio=0, aligned=True) (1): ROIAlign(output_size=(14, 14), spatial_scale=0.125, sampling_ratio=0, aligned=True) (2): ROIAlign(output_size=(14, 14), spatial_scale=0.0625, sampling_ratio=0, aligned=True) (3): ROIAlign(output_size=(14, 14), spatial_scale=0.03125, sampling_ratio=0, aligned=True) ) ) (mask_head): MaskRCNNConvUpsampleHead( (mask_fcn1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (mask_fcn2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (mask_fcn3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (mask_fcn4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (deconv): ConvTranspose2d(256, 256, kernel_size=(2, 2), stride=(2, 2)) (predictor): Conv2d(256, 1, kernel_size=(1, 1), stride=(1, 1)) ) ) ) [10/15 06:16:04 d2.data.build]: Removed 0 images with no usable annotations. 61 images left. [10/15 06:16:04 d2.data.build]: Distribution of training instances among all 1 categories: | category | #instances | |:----------:|:-------------| | balloon | 255 | | | | [10/15 06:16:04 d2.data.detection_utils]: TransformGens used in training: [ResizeShortestEdge(short_edge_length=(640, 672, 704, 736, 768, 800), max_size=1333, sample_style='choice'), RandomFlip()] [10/15 06:16:04 d2.data.build]: Using training sampler TrainingSampler model_final_f10217.pkl: 178MB [00:05, 32.0MB/s] 'roi_heads.box_predictor.cls_score.weight' has shape (81, 1024) in the checkpoint but (2, 1024) in the model! Skipped. 'roi_heads.box_predictor.cls_score.bias' has shape (81,) in the checkpoint but (2,) in the model! Skipped. 'roi_heads.box_predictor.bbox_pred.weight' has shape (320, 1024) in the checkpoint but (4, 1024) in the model! Skipped. 'roi_heads.box_predictor.bbox_pred.bias' has shape (320,) in the checkpoint but (4,) in the model! Skipped. 'roi_heads.mask_head.predictor.weight' has shape (80, 256, 1, 1) in the checkpoint but (1, 256, 1, 1) in the model! Skipped. 'roi_heads.mask_head.predictor.bias' has shape (80,) in the checkpoint but (1,) in the model! Skipped. [10/15 06:16:14 d2.engine.train_loop]: Starting training from iteration 0 [10/15 06:16:40 d2.utils.events]: eta: 0:06:11 iter: 19 total_loss: 2.034 loss_cls: 0.675 loss_box_reg: 0.601 loss_mask: 0.694 loss_rpn_cls: 0.040 loss_rpn_loc: 0.008 time: 1.3195 data_time: 0.0048 lr: 0.000005 max_mem: 2548M [10/15 06:17:09 d2.utils.events]: eta: 0:06:03 iter: 39 total_loss: 1.980 loss_cls: 0.645 loss_box_reg: 0.624 loss_mask: 0.668 loss_rpn_cls: 0.020 loss_rpn_loc: 0.005 time: 1.3679 data_time: 0.0049 lr: 0.000010 max_mem: 2548M [10/15 06:17:35 d2.utils.events]: eta: 0:05:25 iter: 59 total_loss: 1.890 loss_cls: 0.570 loss_box_reg: 0.664 loss_mask: 0.612 loss_rpn_cls: 0.020 loss_rpn_loc: 0.006 time: 1.3569 data_time: 0.0042 lr: 0.000015 max_mem: 2548M [10/15 06:18:02 d2.utils.events]: eta: 0:04:58 iter: 79 total_loss: 1.692 loss_cls: 0.497 loss_box_reg: 0.642 loss_mask: 0.545 loss_rpn_cls: 0.051 loss_rpn_loc: 0.007 time: 1.3498 data_time: 0.0041 lr: 0.000020 max_mem: 2548M [10/15 06:18:30 d2.utils.events]: eta: 0:04:34 iter: 99 total_loss: 1.665 loss_cls: 0.446 loss_box_reg: 0.680 loss_mask: 0.475 loss_rpn_cls: 0.019 loss_rpn_loc: 0.007 time: 1.3566 data_time: 0.0040 lr: 0.000025 max_mem: 2548M [10/15 06:18:56 d2.utils.events]: eta: 0:04:05 iter: 119 total_loss: 1.454 loss_cls: 0.414 loss_box_reg: 0.634 loss_mask: 0.417 loss_rpn_cls: 0.037 loss_rpn_loc: 0.010 time: 1.3515 data_time: 0.0040 lr: 0.000030 max_mem: 2548M [10/15 06:19:23 d2.utils.events]: eta: 0:03:38 iter: 139 total_loss: 1.428 loss_cls: 0.369 loss_box_reg: 0.641 loss_mask: 0.382 loss_rpn_cls: 0.028 loss_rpn_loc: 0.009 time: 1.3515 data_time: 0.0044 lr: 0.000035 max_mem: 2669M [10/15 06:19:50 d2.utils.events]: eta: 0:03:10 iter: 159 total_loss: 1.295 loss_cls: 0.314 loss_box_reg: 0.606 loss_mask: 0.327 loss_rpn_cls: 0.022 loss_rpn_loc: 0.006 time: 1.3487 data_time: 0.0042 lr: 0.000040 max_mem: 2669M [10/15 06:20:18 d2.utils.events]: eta: 0:02:44 iter: 179 total_loss: 1.173 loss_cls: 0.284 loss_box_reg: 0.638 loss_mask: 0.276 loss_rpn_cls: 0.012 loss_rpn_loc: 0.005 time: 1.3549 data_time: 0.0042 lr: 0.000045 max_mem: 2669M [10/15 06:20:43 d2.utils.events]: eta: 0:02:15 iter: 199 total_loss: 1.243 loss_cls: 0.280 loss_box_reg: 0.643 loss_mask: 0.243 loss_rpn_cls: 0.017 loss_rpn_loc: 0.007 time: 1.3461 data_time: 0.0042 lr: 0.000050 max_mem: 2669M [10/15 06:21:09 d2.utils.events]: eta: 0:01:48 iter: 219 total_loss: 1.111 loss_cls: 0.226 loss_box_reg: 0.542 loss_mask: 0.223 loss_rpn_cls: 0.022 loss_rpn_loc: 0.005 time: 1.3423 data_time: 0.0044 lr: 0.000055 max_mem: 2669M [10/15 06:21:37 d2.utils.events]: eta: 0:01:21 iter: 239 total_loss: 1.112 loss_cls: 0.220 loss_box_reg: 0.623 loss_mask: 0.210 loss_rpn_cls: 0.028 loss_rpn_loc: 0.006 time: 1.3454 data_time: 0.0042 lr: 0.000060 max_mem: 2669M [10/15 06:22:04 d2.utils.events]: eta: 0:00:54 iter: 259 total_loss: 0.995 loss_cls: 0.176 loss_box_reg: 0.653 loss_mask: 0.164 loss_rpn_cls: 0.011 loss_rpn_loc: 0.008 time: 1.3461 data_time: 0.0047 lr: 0.000065 max_mem: 2669M [10/15 06:22:31 d2.utils.events]: eta: 0:00:28 iter: 279 total_loss: 0.865 loss_cls: 0.156 loss_box_reg: 0.494 loss_mask: 0.137 loss_rpn_cls: 0.014 loss_rpn_loc: 0.006 time: 1.3460 data_time: 0.0045 lr: 0.000070 max_mem: 2758M [10/15 06:23:00 d2.utils.events]: eta: 0:00:01 iter: 299 total_loss: 0.869 loss_cls: 0.159 loss_box_reg: 0.534 loss_mask: 0.150 loss_rpn_cls: 0.030 loss_rpn_loc: 0.012 time: 1.3476 data_time: 0.0047 lr: 0.000075 max_mem: 2758M [10/15 06:23:01 d2.engine.hooks]: Overall training speed: 297 iterations in 0:06:41 (1.3521 s / it) [10/15 06:23:01 d2.engine.hooks]: Total training time: 0:06:44 (0:00:02 on hooks) OrderedDict()

在 tensorboard 可视化训练过程:

load_ext tensorboard tensorboard --logdir output

5. 模型测试

模型训练完成后,在 ballon 验证数据集上进行模型推断.

# cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth") cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7 #测试的阈值 cfg.DATASETS.TEST = ("balloon/val", ) predictor = DefaultPredictor(cfg) #随机选部分图片,可视化 from detectron2.utils.visualizer import ColorMode dataset_dicts = get_balloon_dicts("balloon/val") for d in random.sample(dataset_dicts, 3): im = cv2.imread(d["file_name"]) outputs = predictor(im) v = Visualizer(im[:, :, ::-1], metadata=balloon_metadata, scale=0.8, instance_mode=ColorMode.IMAGE_BW # 去除非气球区域的像素颜色. ) v = v.draw_instance_predictions(outputs["instances"].to("cpu")) plt.show(v.get_image()[:, :, ::-1]) plt.show()

输出如:

6. 模型评估

计算 COCO API 中的 AP metric.

from detectron2.evaluation import COCOEvaluator, inference_on_dataset from detectron2.data import build_detection_test_loader evaluator = COCOEvaluator("balloon_val", cfg, False, output_dir="./output/") val_loader = build_detection_test_loader(cfg, "balloon_val") inference_on_dataset(trainer.model, val_loader, evaluator)

输出如:

[11/15 04:10:32 d2.evaluation.evaluator]: Start inference on 13 images [11/15 04:10:46 d2.evaluation.evaluator]: Total inference time: 0:00:05 (0.625000 s / img per device, on 1 devices) [11/15 04:10:46 d2.evaluation.evaluator]: Total inference pure compute time: 0:00:03 (0.437190 s / img per device, on 1 devices) [11/15 04:10:46 d2.evaluation.coco_evaluation]: Preparing results for COCO format ... [11/15 04:10:46 d2.evaluation.coco_evaluation]: Saving results to ./output/coco_instances_results.json [11/15 04:10:46 d2.evaluation.coco_evaluation]: Evaluating predictions ... Loading and preparing results... DONE (t=0.00s) creating index... index created! Running per image evaluation... Evaluate annotation type *bbox* DONE (t=0.13s). Accumulating evaluation results... DONE (t=0.02s). Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.681 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.849 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.828 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.011 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.572 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.816 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.228 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.714 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.750 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.133 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.671 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.857 [11/15 04:10:46 d2.evaluation.coco_evaluation]: Evaluation results for bbox: | AP | AP50 | AP75 | APs | APm | APl | |:------:|:------:|:------:|:-----:|:------:|:------:| | 68.133 | 84.928 | 82.807 | 1.106 | 57.231 | 81.589 | Loading and preparing results... DONE (t=0.02s) creating index... index created! Running per image evaluation... Evaluate annotation type *segm* DONE (t=0.15s). Accumulating evaluation results... DONE (t=0.02s). Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.770 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.846 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.840 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.008 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.589 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.936 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.248 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.780 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.820 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.233 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.682 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.957 [11/15 04:10:46 d2.evaluation.coco_evaluation]: Evaluation results for segm: | AP | AP50 | AP75 | APs | APm | APl | |:------:|:------:|:------:|:-----:|:------:|:------:| | 76.976 | 84.647 | 84.017 | 0.763 | 58.887 | 93.632 | OrderedDict([('bbox', {'AP': 68.13251528146942, 'AP50': 84.9277873853965, 'AP75': 82.8068039080166, 'APl': 81.58885128967246, 'APm': 57.23126698500541, 'APs': 1.1056105610561056}), ('segm', {'AP': 76.97562111397131, 'AP50': 84.64734364022362, 'AP75': 84.01711775272273, 'APl': 93.63197968598362, 'APm': 58.887208805837645, 'APs': 0.763201320132013})])

7. 参考

[1] - Colab - Detectron2 Tutorial.ipynb

[2] - Docs - Use Custom Datasets

[3] - 物体检测和分割轻松上手:从detectron2开始 - 知乎

Last modification:July 3rd, 2020 at 02:27 pm