原文:Image Segmentation: Tips and Tricks from 39 Kaggle Competitions - 2020.04.07

作者:Derrick Mwiti

半译半转,学习下.

作者参加了超过 39 个 Kaggle 比赛后进行的总结,如:

1. 使用外在数据

External Data

[1] - 使用 LUng Node Analysis Grand Challenge 数据,其包含了 radiologists 的详细标注数据.

[2] - 使用 LIDC-IDRI 数据,其包含了每个肿瘤的 radiologist 描述数据.

[3] - 使用 Flickr CC, Wikipedia Commons datasets 数据.

[4] - 使用 Human Protein Atlas Dataset 数据.

[5] - 使用 IDRiD 数据.

2. 数据探索和洞察

Data Exploration and Gaining insights

[1] - Clustering of 3d segmentation with the 0.5 threshold

[2] - 确认训练和测试数据集的标签分布是否存在巨大差异. Identify if there is a substantial difference in train/test label distributions

3. 预处理

[1] - 采用 Difference of Gaussian (DoG) 算法进行斑点检测(blob Detection). 使用 skimage 库的函数实现.

[2] - 使用 patch-based inputs for training,以减少训练时间.

[3] - 使用 cudf 进行数据加载,而不是 Pandas,其具有更快的读取.

[4] - 确保所有的图片具有相同的朝向(Ensure that all the images have the same orientation).

[5] - 使用有限对比度自适应直方图均衡化(Apply contrast limited adaptive histogram equalization).

[6] - 使用 OpenCV 进行所有的常规图像处理操作.

[7] - 使用主动学习(automatic active learning) 并添加手工标注(Employ automatic active learning and adding manual annotations).

[8] - 将所有的图像调整为相同尺寸(Resize all images to the same resolution in order to apply the same model to scans of different thicknesses).

[9] - 将扫描图像转换为归一化的 3D numpy 数组(Convert scan images into normalized 3D numpy arrays).

[10] - 利用暗通道先验进行单图像去雾(Apply single Image Haze Removal using Dark Channel Prior).

[11] - 将所有数据转换为 Hounsfield 单元(Convert all data to Hounsfield units).

[12] - 查找重复图像(Find duplicate images using pair-wise correlation on RGBY).

[13] - 开发类别更均衡的采样器(Make labels more balanced by (https://www.sebastiansylvan.com/post/importancesampling/)developing a sampler).

[14] - 对测试数据采用伪标签(Apply pseudo labeling to test data in order to improve score).

[15] - 缩放图像/标注masks 尺寸到 320x480(Scale down images/masks to 320×480).

[16] - 采用 32x32 的核进行直方图均衡化(Histogram equalization (CLAHE) with kernel size 32×32).

[17] - 将 DCM 转化为 PNG(Convert DCM to PNG).

[18] - 计算每张图像的 md5 去重(Calculate the md5 hash for each image when there are duplicate images).

4. 数据增强

[1] - 使用 albumentations 库进行图像增强.

[2] - 随机旋转 90 度(Apply random rotation by 90 degrees).

[3] - 水平、垂直、水平垂直翻转(Use horizontal, vertical or both flips).

[4] - 深度几何变换(Attempt heavy geometric transformations: Elastic Transform, PerspectiveTransform, Piecewise Affine transforms, pincushion distortion).

[5] - 随机HSV(Apply random HSV).

[6] - 采用 loss-less 增强以仅泛化,避免有用信息的 loss 损失(Use of loss-less augmentation for generalization to prevent loss of useful image information).

[7] - 通道乱序(Apply channel shuffling).

[8] - 基于类别频率进行数据增强(Do data augmentation based on class frequency).

[9] - 使用高斯噪声(Apply gaussian noise).

[10] - Use lossless permutations of 3D images for data augmentation.

[11] - 随机旋转一个[0, 45] 间角度(Rotate by a random angle from 0 to 45 degrees).

[12] - 随机缩放一个 [0.8, 12] 间的尺度(Scale by a random factor from 0.8 to 1.2).

[13] - 改变光照(Brightness changing).

[14] - 随机改变HSV值(Randomly change hue, saturation and value).

[15] - Apply D4 augmentations.

[16] - 有限对比度自适应直方图均衡化(Contrast limited adaptive histogram equalization).

[17] - 自动增强策略(Use the AutoAugment augmentation strategy).

5. 模型

5.1. 网络结构

[1] - 使用 UNet 类网络结构(Use of a U-net based architecture. Adopted the concepts and applied them to 3D input tensors)

[2] - The inception-ResNet v2 architecture for training features with different receptive fields

[3] - Siamese networks with adversarial training

[4] - ResNet50, Xception, Inception ResNet v2 x 5 with Dense (FC) layer as the final layer

[5] - Use of a global max-pooling layer which returns a fixed-length output no matter the input size

[6] - Use of stacked dilated convolutions

[7] - VoxelNet

[8] - Replace plus sign in LinkNet skip connections with concat and conv1x1

[9] - Generalized mean pooling

[10] - Keras NASNetLarge to train the model from scratch using 224x224x3

[11] - Use of the 3D convnet to slide over the images

[12] - Imagenet-pre-trained ResNet152 as the feature extractor

[13] - Replace the final fully-connected layers of ResNet by 3 fully connected layers with dropout

[14] - Use ConvTranspose in the decoder

[15] - Applying the VGG baseline architecture

[16] - Implementing the C3D network with adjusted receptive fields and a 64 unit bottleneck layer on the end of the network

[17] - Use of UNet type architectures with pre-trained weights to improve convergence and performance of binary segmentation on 8-bit RGB input images

[18] - LinkNet since it’s fast and memory efficient

[19] - MASKRCNN

[20] - BN-Inception

[21] - Fast Point R-CNN

[22] - Seresnext

[23] - UNet and Deeplabv3

[24] - Faster RCNN

[25] - SENet154

[26] - ResNet152

[27] - NASNet-A-Large

[28] - EfficientNetB4

[29] - ResNet101

[30] - GAPNet

[31] - PNASNet-5-Large

[32] - Densenet121

[33] - AC-GAN

[34] - XceptionNet (96), XceptionNet (299), Inception v3 (139), InceptionResNet v2 (299), DenseNet121 (224)

[35] - AlbuNet (resnet34) from ternausnets

[36] - SpaceNet

[37] - Resnet50 from selim_sef SpaceNet 4

[38] - SCSEUnet (seresnext50) from selim_sef SpaceNet 4

[39] - A custom Unet and Linknet architecture

[40] - FPNetResNet50 (5 folds)

[41] - FPNetResNet101 (5 folds)

[42] - FPNetResNet101 (7 folds with different seeds)

[43] - PANetDilatedResNet34 (4 folds)

[44] - PANetResNet50 (4 folds)

[45] - EMANetResNet101 (2 folds)

[46] - RetinaNet

[47] - Deformable R-FCN

[48] - Deformable Relation Networks

5.2. 硬件平台

[1] - Use of the AWS GPU instance p2.xlarge with a NVIDIA K80 GPU

[2] - Pascal Titan-X GPU

[3] - Use of 8 TITAN X GPUs

[4] - 6 GPUs: 21080Ti + 41080

[5] - Server with 8×NVIDIA Tesla P40, 256 GB RAM and 28 CPU cores

[6] - Intel Core i7 5930k, 2×1080, 64 GB of RAM, 2x512GB SSD, 3TB HDD

[7] - GCP 1x P100, 8x CPU, 15 GB RAM, SSD or 2x P100, 16x CPU, 30 GB RAM

[8] - NVIDIA Tesla P100 GPU with 16GB of RAM

[9] - Intel Core i7 5930k, 2×1080, 64 GB of RAM, 2x512GB SSD, 3TB HDD

[10] - 980Ti GPU, 2600k CPU, and 14GB RAM

5.3. 损失函数

[1] - Dice Coefficient ,对于不平衡数据(imbalanced data)更有效.

[2] - Weighted boundary loss,用于降低预测 segmentation 和 GT 的距离.

[3] - MultiLabelSoftMarginLoss that creates a criterion that optimizes a multi-label one-versus-all loss based on max-entropy, between input and target

[4] - Balanced cross entropy (BCE) with logit loss that involves weighing the positive and negative examples by a certain coefficient

[5] - Lovasz that performs direct optimization of the mean intersection-over-union loss in neural networks based on the convex Lovasz extension of sub-modular losses

[6] - FocalLoss + Lovasz obtained by summing the Focal and Lovasz losses

[7] - Arc margin loss that incorporates margin in order to maximise face class separability

[8] - Npairs loss that computes the npairs loss between y_true and y_pred.

[9] - A combination of BCE and Dice loss functions

[10] - LSEP – a pairwise ranking that is is smooth everywhere and thus is easier to optimize

[11] - Center loss that simultaneously learns a center for deep features of each class and penalizes the distances between the deep features and their corresponding class centers

[12] - Ring Loss that augments standard loss functions such as Softmax

[13] - Hard triplet loss that trains a network to embed features of the same class at the same time maximizing the embedding distance of different classes

[14] - 1 + BCE – Dice that involves subtracting the BCE and DICE losses then adding 1

[15] - Binary cross-entropy –  log(dice) that is the binary cross-entropy minus the log of the dice loss

[16] - Combinations of BCE, dice and focal

[17] - Lovasz Loss that loss performs direct optimization of the mean intersection-over-union loss

[18] - BCE + DICE - Dice loss is obtained by calculating smooth dice coefficient function

[19] - Focal loss with Gamma 2 that is an improvement to the standard cross-entropy criterion

[20] - BCE + DICE + Focal – this is basically a summation of the three loss functions

[21] - Active Contour Loss that incorporates the area and size information and integrates the information in a dense deep learning model

[22] - 1024 * BCE(results, masks) + BCE(cls, cls_target)

[23] - Focal + kappa – Kappa is a loss function for multi-class classification of ordinal data in deep learning. In this case we sum it and the focal loss

[24] - ArcFaceLoss — Additive Angular Margin Loss for Deep Face Recognition

[25] - soft Dice trained on positives only – Soft Dice uses predicted probabilities

[26] - 2.7 BCE(pred_mask, gt_mask) + 0.9 DICE(pred_mask, gt_mask) + 0.1 * BCE(pred_empty, gt_empty) which is a custom loss used by the Kaggler

[27] - nn.SmoothL1Loss() that creates a criterion that uses a squared term if the absolute element-wise error falls below 1 and an L1 term otherwise

[28] - Use of the Mean Squared Error objective function in scenarios where it seems to work better than binary-cross entropy objective function.

5.4. 训练技巧

[1] - 尝试不同的学习率(Try different learning rates).

[2] -尝试不同的 batchsizes(Try different batch sizes).

[3] - Use SDG with momentum with manual rate scheduling

[4] - 过多的数据增强会降低精度(Too much augmentation will reduce the accuracy).

[5] - 在裁剪后的图像上训练,但在完整图像上测试(Train on image crops and predict on full images)

[6] - 采用 Keras ReduceLROnPlateau() 来控制学习率( Use of Keras’s ReduceLROnPlateau() to the learning rate).

[7] - 首先不数据增强进行训练,直到损失函数趋平;然后在某些 epochs 采用 soft 和 hard 数据增强(Train without augmentation until plateau then apply soft and hard augmentation to some epochs).

[8] - 冻结除了最后一层的其他网络层;使用 Stage1 的 1000 张图片进行finetune(Freeze all layers except the last one and use 1000 images from Stage1 for tuning

[9] - 开发采样器使标签更均衡(Make labels more balanced by developing a sampler).

[10] - Use of class aware sampling

[11] - Use dropout and augmentation while tuning the last layer

[12] - Pseudo Labeling to improve score

[13] - Use Adam reducing LR on plateau with patience 2–4

[14] - Use Cyclic LR with SGD

[15] - Reduce the learning rate by a factor of two if validation loss does not improve for two consecutive epochs

[16] - Repeat the worst batch out of 10 batches

[17] - Train with default UNET

[18] - Overlap tiles so that each edge pixel is covered twice

[19] - Hyperparameter tuning: learning rate on training, non-maximum suppression and score threshold on inference

[20] - Remove low bounding box with low confidence score

[21] - Train different convolutional neural networks then build an ensemble

[22] - Stop training when the F1 score is decreasing

[23] - Differential learning rate with gradual reducing

[24] - Train ANNs in a stacking way using 5 folds and 30 repeats

[25] - Track of your experiments using Neptune

6. 评测和交叉验证

[1] - Split on non-uniform stratified by classes

[2] - Avoid overfitting by applying cross-validation while tuning the last layer

[3] - 10-fold CV ensemble for classification

[4] - Combination of 5 10-fold CV ensembles for detection

[5] - Sklearn’s stratified K fold function

[6] - 5 KFold Cross-Validation

[7] - Adversarial Validation & Weighting

7. 集成方法

[1] - Use simple majority voting for ensemble

[2] - XGBoost on the max malignancy at 3 zoom levels, the z-location and the amount of strange tissue

[3] - LightGBM for models with too many classes. This was done for raw data features only.

[4] - CatBoost for a second-layer model

[5] - Training with 7 features for the gradient boosting classifier

[6] - Use ‘curriculum learning’ to speed up model training. In this technique, models are first trained on simple samples then progressively moving to hard ones.

[7] - Ensemble with ResNet50, InceptionV3, and InceptionResNetV2

[8] - Ensemble method for object detection

[9] - An ensemble of Mask RCNN, YOLOv3, and Faster RCNN architectures n with a classification network — DenseNet-121 architecture

8. 后处理

[1] - Apply test time augmentation — presenting an image to a model several times with different random transformations and average the predictions you get

[2] - Equalize test prediction probabilities instead of only using predicted classes

[3] - Apply geometric mean to the predictions

[4] - Overlap tiles during inferencing so that each edge pixel is covered at least thrice because UNET tends to have bad predictions around edge areas.

[5] - Non-maximum suppression and bounding box shrinkage

[6] - Watershed post processing to detach objects in instance segmentation problems.

Last modification:May 13th, 2020 at 08:45 am