COCO数据集有超过 200,000 张图片,80种物体类别. 所有的物体实例都用详细的分割mask进行了标注,共标注了超过 500,000 个物体实体.
{
person # 1
vehicle 交通工具 #8
{bicycle
car
motorcycle
airplane
bus
train
truck
boat}
outdoor #5
{traffic light
fire hydrant
stop sign
parking meter
bench}
animal #10
{bird
cat
dog
horse
sheep
cow
elephant
bear
zebra
giraffe}
accessory 饰品 #5
{backpack 背包
umbrella 雨伞
handbag 手提包
tie 领带
suitcase 手提箱
}
sports #10
{frisbee
skis
snowboard
sports ball
kite
baseball bat
baseball glove
skateboard
surfboard
tennis racket
}
kitchen #7
{bottle
wine glass
cup
fork
knife
spoon
bowl
}
food #10
{banana
apple
sandwich
orange
broccoli
carrot
hot dog
pizza
donut
cake
}
furniture 家具 #6
{chair
couch
potted plant
bed
dining table
toilet
}
electronic 电子产品 #6
{tv
laptop
mouse
remote
keyboard
cell phone
}
appliance 家用电器 #5
{microwave
oven
toaster
sink
refrigerator
}
indoor #7
{book
clock
vase
scissors
teddy bear
hair drier
toothbrush
}
}
注:
PASCAL VOC 语义类别(#20):
{
aeroplane
bicycle
bird
boat
bottle
bus
car
cat
chair
cow
diningtable
dog
horse
motorbike
person
pottedplant
sheep
sofa
train
tvmonitor
}
<h2>COCO Dataset</h2>
annotainon 数据格式:
- object instances
- object keypoints
- image captions
基本数据结构如下:
{
"info" : info,
"images" : [image],
"annotations" : [annotation],
"licenses" : [license],
}
info {
"year" : int,
"version" : str,
"description" : str,
"contributor" : str,
"url" : str,
"date_created" : datetime,
}
image{
"id" : int, # 图片id
"width" : int, # 图片宽
"height" : int, # 图片高
"file_name" : str, # 图片名
"license" : int,
"flickr_url" : str,
"coco_url" : str, # 图片链接
"date_captured" : datetime, # 图片标注时间
}
license{
"id" : int,
"name" : str,
"url" : str,
}
<h2>Object Instance Annotations</h2>
实例标注形式:
annotation{
"id" : int,
"image_id" : int,
"category_id" : int,
"segmentation" : RLE or [polygon],
"area" : float,
"bbox" : [x,y,width,height],
"iscrowd" : 0 or 1,
}
categories[{
"id" : int,
"name" : str,
"supercategory" : str,
}]
其中,
如果instance表示单个object,则iscrowd=0,segmentation=polygon; 单个object也可能需要多个polygons,比如occluded的情况下;
如果instance表示多个objecs的集合,则iscrowd=1,segmentation=RLE. iscrowd=1用于标注较多的objects,比如人群.
<h2>Object Keypoint Annotations</h2>
关键点标注形式:
annotation{
"keypoints" : [x1,y1,v1,...],
"num_keypoints" : int,
"[cloned]" : ...,
}
categories[{
"keypoints" : [str],
"skeleton" : [edge],
"[cloned]" : ...,
}]
关键点标注包括了物体标注的所有数据(比如 id, bbox, 等等),以及两种额外属性信息.
"keypoints"是长度为 3K 的数组,K是对某类定义的关键点总数,位置为[x,y],关键点可见性v.
如果关键点没有标注信息,则关键点位置[x=y=0],可见性v=1;
如果关键点有标注信息,但不可见,则v=2.
如果关键点在物体segment内,则认为可见.
"num_keypoints"是物体所标注的关键点数(v>0). 对于物体较多,比如物体群或者小物体时,num_keypoints=0.
对于每个类别,categories结构体数据有两种属性:"keypoints" 和 "skeleton".
"keypoints" 是长度为k的关键点名字符串;
"skeleton" 定义了关键点的连通性,主要是通过一组关键点边缘队列表的形式表示,用于可视化.
COCO现阶段仅对人体类别进行了标注.
<h2>Image Caption Annotations</h2>
图片描述/说明标注形式:
annotation{
"id" : int,
"image_id" : int,
"caption" : str,
}
图片描述标注包含了图片的主题信息. 每个主题描述了特定的图片,每张图片至少有5个主题.