基于语义分割和Alpha blending的图像背景定制[译]

> 原文：[Whiten/customize the background of images using Semantic Segmentation and Alpha Blending - 2019.12.26](https://medium.com/ibm-garage/semantic-segmentation-and-alpha-blending-for-whitening-customizing-the-background-of-an-image-a8c65ea0e384)
>
> 作者：[Shilpi Bhattacharyya](https://medium.com/@bhattacharyya.shilpi.sbu?source=post_page-----a8c65ea0e384----------------------)

![](https://aiuai.cn/uploads/2009/391febf04154448c.png)

图：语义分割和 alpha blending 技术用于改变图像的背景.

![](https://aiuai.cn/uploads/2009/e7ce9a3a7f05e89d.png)

图：使用语义分割和 alpha blending 的处理流程. 将背景区域像素值设为 0.

> 完整实现：[https://github.com/shilpibhattacharyya/customizing_image_backgrounds](https://github.com/shilpibhattacharyya/customizing_image_backgrounds)

是否有使用 Photoshop 进行图像背景白化(whiten)和改变时抓狂的时候呢？有的话，请进入语义分割.

## 1. 语义分割

语义分割是一种将图像中每个像素进行归类的图像分析任务. 类似于人的方式，当人在看一幅图像时，知道图像的每个像素属于哪一类.

> 更多关于语义分割的资料，可参考：[Image Segmentation - 2018.11.05](https://www.learnopencv.com/image-segmentation/)

这里采用 DeepLabV3 进行语义分割，alpha blending 进行将原图融合到新背景.

![](https://aiuai.cn/uploads/2009/efaefd660d81f558.png)

图：输入图像采用 Deeplab 模型语义分割处理. 期望输出是，覆盖在输入图像上的语义标签. 语义分割模型关注与对图像中每个像素分配语义标签，如sky, person, car 等.

DeepLab 是 Google Brain 提出的语义分割结构，其是一种先进的语义分割深度学习模型. 其系列主要有：

- DeepLabv1 - 采用 atrous conv 来显式地控制网络中所计算的特征图的分辨率；

- DeepLabv2 - 采用 atrous spatial pyramid pooling(ASPP) 来鲁棒的分割物体，其在不同的采样率和有效接受野进行多尺度处理(using *atrous spatial pyramid pooling* (ASPP) to robustly segment objects at multiple scales with filters at multiple sampling rates and effective fields-of-views)；

- DeepLabv3 - 采用图像层(image-level)特征来增强 ASPP 模块，以捕获更大范围的特征. 此外，还包含 batch normalization 参数来提升模型训练. 具体来说，在训练和验证阶段，采用 atrous conv 提取不同输出步长(output strides)的输出特征，有效的增强了在步长=16 的 BN 训练，且验证阶段在步长=8时获得了更好的效果.

> [google - deeplab](https://cloud.google.com/tpu/docs/tutorials/deeplab)

## 2. 语义分割实现

### 2.1. 分割输出解码

函数：将 2D 分割图像转换为 RGB 图像，其中每一个标签被映射到对应的颜色.

```python
import numpy as np

def decode_segmap(image, nc=21):
    label_colors = np.array([(0, 0, 0),  # 0=background
    # 1=aeroplane, 2=bicycle, 3=bird, 4=boat, 5=bottle
    (128, 0, 0), (0, 128, 0), (255, 255, 255), (0, 0, 128), (128, 0, 128),
    # 6=bus, 7=car, 8=cat, 9=chair, 10=cow
    (0, 128, 128), (128, 128, 128), (64, 0, 0), (192, 0, 0), (64, 128, 0),
    # 11=dining table, 12=dog, 13=horse, 14=motorbike, 15=person
    (192, 128, 0), (64, 0, 128), (192, 0, 128), (64, 128, 128), (255, 255, 255),
    # 16=potted plant, 17=sheep, 18=sofa, 19=train, 20=tv/monitor
    (0, 64, 0), (128, 64, 0), (0, 192, 0), (128, 192, 0), (0, 64, 128)])
    
    r = np.zeros_like(image).astype(np.uint8)
    g = np.zeros_like(image).astype(np.uint8)
    b = np.zeros_like(image).astype(np.uint8)
    for l in range(0, nc):
        idx = image == l
        r[idx] = label_colors[l, 0]
        g[idx] = label_colors[l, 1]
        b[idx] = label_colors[l, 2]
        rgb = np.stack([r, g, b], axis=2)
    return rgb
```

### 2.2. 图像预处理与Forward

```python
import matplotlib.pyplot as plt
import torchvision.transforms as T
import torch
from PIL import Image

def segment(net, path, show_orig=True, dev='cuda'):
    img = Image.open(path)
    if show_orig: 
        plt.imshow(img); 
        plt.axis('off'); 
        plt.show()
        
    # Comment the Resize and CenterCrop for better inference results
    trf = T.Compose([T.ToTensor(),
                     T.Normalize(mean = [0.485, 0.456, 0.406],
                                         std = [0.229, 0.224, 0.225])])
    inp = trf(img).unsqueeze(0).to(dev)
    out = net.to(dev)(inp)['out']
    om = torch.argmax(out.squeeze(), dim=0).detach().cpu().numpy()
    rgb = decode_segmap(om)
    return rgb
```

### 2.3. 加载 deeplabv3_resnet101 模型

```python
from torchvision import models
dlab = models.segmentation.deeplabv3_resnet101(pretrained=1).eval()

rgb=segment(dlab, '/path/to/test.jpg')
## If there are multiple labeled objects in the image, use the below code to have only the target as the foreground
rgb[rgb!=255]=0
```

## 3. 背景融合(Alpha blending)实现

Alpha blending 是将前景图像和透明图覆盖到背景图像上的处理. 透明图通常是图像的第四个通道(如，透明 PNG)，但其也可以是单独的图像. 透明 mask 图也一般被称为 alpha mask 或 alpha matte.

Alpha blending 背后的数据原理是比较直接的，对于图像的每个像素，需要采用 alpha mask 将前景图像颜色和背景图像颜色组合在一起.

前景图像是原始图像，灰度 alpha mask 是有语义分割得到的结果，背景图像是需要更换的新的定制图像.

```python
#alpha blending to customize the background of the image

# Read the images
foreground = cv2.imread("/path/to/foreground.jpg")
background = cv2.imread("/path/to/background.jpg", cv2.IMREAD_COLOR)
background = cv2.resize(background, (rgb.shape[1],rgb.shape[0]), interpolation = cv2.INTER_AREA)
alpha = rgb # 2.3

# Convert uint8 to float
foreground = foreground.astype(float)
background = background.astype(float)
# Normalize the alpha mask to keep intensity between 0 and 1
alpha = alpha.astype(float)/255
# Multiply the foreground with the alpha matte
foreground = cv2.multiply(alpha, foreground)
# Multiply the background with ( 1 - alpha )
background = cv2.multiply(1.0 - alpha, background)
# Add the masked foreground and background.
outImage = cv2.add(foreground, background)

# Display image
cv2_imshow(outImage)
cv2.waitKey(0)
# Save/download image
cv2.imwrite('org_plus_cust_bkg_img.png' , outImage)
```

如：

![](https://aiuai.cn/uploads/2009/34fac0262465e910.png)

图：语义分割和 alpha blending 的背景替换结果

![](https://aiuai.cn/uploads/2009/6dc6a793a8dc46c7.png)

图：语义分割和 alpha blending 的处理过程

## 4. 背景白化(Whiten)

```python
#whiten the background of the image
mask_out=cv2.subtract(rgb,img)
mask_out=cv2.subtract(rgb,mask_out)
mask_out[rgb == 0] = 255

# Display the result
numpy_horizontal = np.hstack((img, mask_out))
numpy_horizontal_concat = np.concatenate((img, mask_out), axis=1)
cv2_imshow(numpy_horizontal_concat)
cv2.waitKey(0)
# Save/download the resulting image
cv2.imwrite('org_plus_white_bkg_image.jpeg',numpy_horizontal_concat)
```

如：

![](https://aiuai.cn/uploads/2009/0525edae39a9cdb8.png)

图：图像背景白化结果

## 5. 背景模糊(Blur)

```python
# Read the images
foreground = cv2.imread("/path/to/girl8.jpg")

# Create a Gaussian blur of kernel size 7 for the background image
blurredImage = cv2.GaussianBlur(foreground, (7,7), 0)
# Convert uint8 to float
foreground = foreground.astype(float)
blurredImage = blurredImage.astype(float)

# Create a binary mask of the RGB output map using the threshold value 0
th, alpha = cv2.threshold(np.array(rgb),0, 255, cv2.THRESH_BINARY)

# Apply a slight blur to the mask to soften edges
alpha = cv2.GaussianBlur(alpha, (7,7),0)
# Normalize the alpha mask to keep intensity between 0 and 1
alpha = alpha.astype(float)/255
# Multiply the foreground with the alpha matte
foreground = cv2.multiply(alpha, foreground)
# Multiply the background with ( 1 - alpha )
background = cv2.multiply(1.0 - alpha, blurredImage)
# Add the masked foreground and background
outImage = cv2.add(foreground, background)

# Return a normalized output image for display
outImage= outImage
numpy_horizontal = np.hstack((img, outImage))
numpy_horizontal_concat = np.concatenate((img, outImage), axis=1)

# Display image
cv2_imshow(numpy_horizontal_concat)
cv2.waitKey(0)
# Save/download the resulting image
cv2.imwrite('res.png' , numpy_horizontal_concat)
```

如：

![](https://aiuai.cn/uploads/2009/65dcc31184dd4055.png)

图：图像背景融合结果

![](https://aiuai.cn/uploads/2009/7386fe701048c454.png)

## 6. 背景灰度化(Grayscale)

```python
# Load the foreground input image
foreground = cv2.imread("/path/to/woman-with-car.jpg")

# Resize image to match shape of R-band in RGB output map
foreground = cv2.resize(foreground, (rgb.shape[1],rgb.shape[0]), interpolation = cv2.INTER_AREA)
# Create a background image by copying foreground and converting into grayscale
background = cv2.cvtColor(foreground, cv2.COLOR_BGR2GRAY)
# convert single channel grayscale image to 3-channel grayscale image
background = cv2.cvtColor(background, cv2.COLOR_GRAY2RGB)
# Convert uint8 to float
foreground = foreground.astype(float)
background = background.astype(float)
# Create a binary mask of the RGB output map using the threshold value 0
th, alpha = cv2.threshold(np.array(rgb),0,255, cv2.THRESH_BINARY)
# Apply a slight blur to the mask to soften edges
alpha = cv2.GaussianBlur(alpha, (7,7),0)
# Normalize the alpha mask to keep intensity between 0 and 1
alpha = alpha.astype(float)/255
# Multiply the foreground with the alpha matte
foreground = cv2.multiply(alpha, foreground)
# Multiply the background with ( 1 - alpha )
background = cv2.multiply(1.0 - alpha, background)
# Add the masked foreground and background
outImage = cv2.add(foreground, background)
numpy_horizontal = np.hstack((img, outImage))
numpy_horizontal_concat = np.concatenate((img, outImage), axis=1)

# Display image
cv2_imshow(numpy_horizontal_concat)
cv2.waitKey(0)
# Save image
cv2.imwrite('res.png' , numpy_horizontal_concat)
```

如：

![](https://aiuai.cn/uploads/2009/9c3889ef8a954d3b.png)

图：背景灰度化结果

![](https://aiuai.cn/uploads/2009/a7729d45313b5b86.png)

图：背景灰度化处理过程

## 参考

1. Alpha Blending — https://www.learnopencv.com/alpha-blending-using-opencv-cpp-python/
2. Semantic Segmentation — https://www.learnopencv.com/pytorch-for-beginners-semantic-segmentation-using-torchvision/
3. DeepLab — https://github.com/tensorflow/models/tree/master/research/deeplab
4. https://www.learnopencv.com/applications-of-foreground-background-separation-with-semantic-segmentation/

原文：Whiten/customize the background of images using Semantic Segmentation and Alpha Blending - 2019.12.26
作者：Shilpi Bhattacharyya

图：语义分割和 alpha blending 技术用于改变图像的背景.

图：使用语义分割和 alpha blending 的处理流程. 将背景区域像素值设为 0.

完整实现：https://github.com/shilpibhattacharyya/customizing_image_backgrounds

是否有使用 Photoshop 进行图像背景白化(whiten)和改变时抓狂的时候呢？有的话，请进入语义分割.

1. 语义分割

语义分割是一种将图像中每个像素进行归类的图像分析任务. 类似于人的方式，当人在看一幅图像时，知道图像的每个像素属于哪一类.

更多关于语义分割的资料，可参考：Image Segmentation - 2018.11.05

这里采用 DeepLabV3 进行语义分割，alpha blending 进行将原图融合到新背景.

DeepLab 是 Google Brain 提出的语义分割结构，其是一种先进的语义分割深度学习模型. 其系列主要有：

DeepLabv1 - 采用 atrous conv 来显式地控制网络中所计算的特征图的分辨率；
DeepLabv2 - 采用 atrous spatial pyramid pooling(ASPP) 来鲁棒的分割物体，其在不同的采样率和有效接受野进行多尺度处理(using atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales with filters at multiple sampling rates and effective fields-of-views)；
DeepLabv3 - 采用图像层(image-level)特征来增强 ASPP 模块，以捕获更大范围的特征. 此外，还包含 batch normalization 参数来提升模型训练. 具体来说，在训练和验证阶段，采用 atrous conv 提取不同输出步长(output strides)的输出特征，有效的增强了在步长=16 的 BN 训练，且验证阶段在步长=8时获得了更好的效果.

google - deeplab

2. 语义分割实现

2.1. 分割输出解码

函数：将 2D 分割图像转换为 RGB 图像，其中每一个标签被映射到对应的颜色.

import numpy as np

def decode_segmap(image, nc=21):
    label_colors = np.array([(0, 0, 0),  # 0=background
    # 1=aeroplane, 2=bicycle, 3=bird, 4=boat, 5=bottle
    (128, 0, 0), (0, 128, 0), (255, 255, 255), (0, 0, 128), (128, 0, 128),
    # 6=bus, 7=car, 8=cat, 9=chair, 10=cow
    (0, 128, 128), (128, 128, 128), (64, 0, 0), (192, 0, 0), (64, 128, 0),
    # 11=dining table, 12=dog, 13=horse, 14=motorbike, 15=person
    (192, 128, 0), (64, 0, 128), (192, 0, 128), (64, 128, 128), (255, 255, 255),
    # 16=potted plant, 17=sheep, 18=sofa, 19=train, 20=tv/monitor
    (0, 64, 0), (128, 64, 0), (0, 192, 0), (128, 192, 0), (0, 64, 128)])
    
    r = np.zeros_like(image).astype(np.uint8)
    g = np.zeros_like(image).astype(np.uint8)
    b = np.zeros_like(image).astype(np.uint8)
    for l in range(0, nc):
        idx = image == l
        r[idx] = label_colors[l, 0]
        g[idx] = label_colors[l, 1]
        b[idx] = label_colors[l, 2]
        rgb = np.stack([r, g, b], axis=2)
    return rgb

2.2. 图像预处理与Forward

import matplotlib.pyplot as plt
import torchvision.transforms as T
import torch
from PIL import Image

def segment(net, path, show_orig=True, dev='cuda'):
    img = Image.open(path)
    if show_orig: 
        plt.imshow(img); 
        plt.axis('off'); 
        plt.show()
        
    # Comment the Resize and CenterCrop for better inference results
    trf = T.Compose([T.ToTensor(),
                     T.Normalize(mean = [0.485, 0.456, 0.406],
                                         std = [0.229, 0.224, 0.225])])
    inp = trf(img).unsqueeze(0).to(dev)
    out = net.to(dev)(inp)['out']
    om = torch.argmax(out.squeeze(), dim=0).detach().cpu().numpy()
    rgb = decode_segmap(om)
    return rgb

2.3. 加载 deeplabv3_resnet101 模型

from torchvision import models
dlab = models.segmentation.deeplabv3_resnet101(pretrained=1).eval()

rgb=segment(dlab, '/path/to/test.jpg')
## If there are multiple labeled objects in the image, use the below code to have only the target as the foreground
rgb[rgb!=255]=0

3. 背景融合(Alpha blending)实现

Alpha blending 背后的数据原理是比较直接的，对于图像的每个像素，需要采用 alpha mask 将前景图像颜色和背景图像颜色组合在一起.

前景图像是原始图像，灰度 alpha mask 是有语义分割得到的结果，背景图像是需要更换的新的定制图像.

#alpha blending to customize the background of the image

# Read the images
foreground = cv2.imread("/path/to/foreground.jpg")
background = cv2.imread("/path/to/background.jpg", cv2.IMREAD_COLOR)
background = cv2.resize(background, (rgb.shape[1],rgb.shape[0]), interpolation = cv2.INTER_AREA)
alpha = rgb # 2.3

# Convert uint8 to float
foreground = foreground.astype(float)
background = background.astype(float)
# Normalize the alpha mask to keep intensity between 0 and 1
alpha = alpha.astype(float)/255
# Multiply the foreground with the alpha matte
foreground = cv2.multiply(alpha, foreground)
# Multiply the background with ( 1 - alpha )
background = cv2.multiply(1.0 - alpha, background)
# Add the masked foreground and background.
outImage = cv2.add(foreground, background)

# Display image
cv2_imshow(outImage)
cv2.waitKey(0)
# Save/download image
cv2.imwrite('org_plus_cust_bkg_img.png' , outImage)

如：

图：语义分割和 alpha blending 的背景替换结果

图：语义分割和 alpha blending 的处理过程

4. 背景白化(Whiten)

#whiten the background of the image
mask_out=cv2.subtract(rgb,img)
mask_out=cv2.subtract(rgb,mask_out)
mask_out[rgb == 0] = 255

# Display the result
numpy_horizontal = np.hstack((img, mask_out))
numpy_horizontal_concat = np.concatenate((img, mask_out), axis=1)
cv2_imshow(numpy_horizontal_concat)
cv2.waitKey(0)
# Save/download the resulting image
cv2.imwrite('org_plus_white_bkg_image.jpeg',numpy_horizontal_concat)

如：

图：图像背景白化结果

5. 背景模糊(Blur)

# Read the images
foreground = cv2.imread("/path/to/girl8.jpg")

# Create a Gaussian blur of kernel size 7 for the background image
blurredImage = cv2.GaussianBlur(foreground, (7,7), 0)
# Convert uint8 to float
foreground = foreground.astype(float)
blurredImage = blurredImage.astype(float)

# Create a binary mask of the RGB output map using the threshold value 0
th, alpha = cv2.threshold(np.array(rgb),0, 255, cv2.THRESH_BINARY)

# Apply a slight blur to the mask to soften edges
alpha = cv2.GaussianBlur(alpha, (7,7),0)
# Normalize the alpha mask to keep intensity between 0 and 1
alpha = alpha.astype(float)/255
# Multiply the foreground with the alpha matte
foreground = cv2.multiply(alpha, foreground)
# Multiply the background with ( 1 - alpha )
background = cv2.multiply(1.0 - alpha, blurredImage)
# Add the masked foreground and background
outImage = cv2.add(foreground, background)

# Return a normalized output image for display
outImage= outImage
numpy_horizontal = np.hstack((img, outImage))
numpy_horizontal_concat = np.concatenate((img, outImage), axis=1)

# Display image
cv2_imshow(numpy_horizontal_concat)
cv2.waitKey(0)
# Save/download the resulting image
cv2.imwrite('res.png' , numpy_horizontal_concat)

如：

图：图像背景融合结果

6. 背景灰度化(Grayscale)

# Load the foreground input image
foreground = cv2.imread("/path/to/woman-with-car.jpg")

# Resize image to match shape of R-band in RGB output map
foreground = cv2.resize(foreground, (rgb.shape[1],rgb.shape[0]), interpolation = cv2.INTER_AREA)
# Create a background image by copying foreground and converting into grayscale
background = cv2.cvtColor(foreground, cv2.COLOR_BGR2GRAY)
# convert single channel grayscale image to 3-channel grayscale image
background = cv2.cvtColor(background, cv2.COLOR_GRAY2RGB)
# Convert uint8 to float
foreground = foreground.astype(float)
background = background.astype(float)
# Create a binary mask of the RGB output map using the threshold value 0
th, alpha = cv2.threshold(np.array(rgb),0,255, cv2.THRESH_BINARY)
# Apply a slight blur to the mask to soften edges
alpha = cv2.GaussianBlur(alpha, (7,7),0)
# Normalize the alpha mask to keep intensity between 0 and 1
alpha = alpha.astype(float)/255
# Multiply the foreground with the alpha matte
foreground = cv2.multiply(alpha, foreground)
# Multiply the background with ( 1 - alpha )
background = cv2.multiply(1.0 - alpha, background)
# Add the masked foreground and background
outImage = cv2.add(foreground, background)
numpy_horizontal = np.hstack((img, outImage))
numpy_horizontal_concat = np.concatenate((img, outImage), axis=1)

# Display image
cv2_imshow(numpy_horizontal_concat)
cv2.waitKey(0)
# Save image
cv2.imwrite('res.png' , numpy_horizontal_concat)

如：

图：背景灰度化结果

图：背景灰度化处理过程

参考

Alpha Blending — https://www.learnopencv.com/alpha-blending-using-opencv-cpp-python/
Semantic Segmentation — https://www.learnopencv.com/pytorch-for-beginners-semantic-segmentation-using-torchvision/
DeepLab — https://github.com/tensorflow/models/tree/master/research/deeplab
https://www.learnopencv.com/applications-of-foreground-background-separation-with-semantic-segmentation/

Last modification：September 17, 2020

If you think my article is useful to you, please feel free to appreciate

基于语义分割和Alpha blending的图像背景定制[译]

1. 语义分割

2. 语义分割实现

2.1. 分割输出解码

2.2. 图像预处理与Forward

2.3. 加载 deeplabv3_resnet101 模型

3. 背景融合(Alpha blending)实现

4. 背景白化(Whiten)

5. 背景模糊(Blur)

6. 背景灰度化(Grayscale)

参考

2 comments

Leave a Comment Cancel reply
欢迎交流学习

基于语义分割和Alpha blending的图像背景定制[译]

1. 语义分割

2. 语义分割实现

2.1. 分割输出解码

2.2. 图像预处理与Forward

2.3. 加载 deeplabv3_resnet101 模型

3. 背景融合(Alpha blending)实现

4. 背景白化(Whiten)

5. 背景模糊(Blur)

6. 背景灰度化(Grayscale)

参考

2 comments

Leave a Comment Cancel reply 欢迎交流学习

基于语义分割和Alpha blending的图像背景定制[译]

Leave a Comment Cancel reply
欢迎交流学习