语义分割 U2net网络学习笔记（附代码）

本文主要是介绍语义分割 U2net网络学习笔记（附代码），希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

论文地址：https://arxiv.org/abs/2005.09007
代码地址：https://github.com/xuebinqin/U-2-Net

强烈建议看一下B站霹导的原理讲解视频和代码讲解视频，链接放到最后了

1.是什么？

U2-Net是一种用于显着性目标检测的深度学习网络，它采用了嵌套U结构，可以在保持高精度的同时实现快速的图像分割。它在多个数据集上进行了测试，并取得了优异的表现。U2-Net的预训练模型u2net.pth可以用于图像分割任务，例如去除图像背景等。

2.为什么？

3.怎么样？

3.1网络结构

下图是原论文中的图5，该图展示了整个U2Net网络的结构。通过下图可以看到网络的主体是一个类似UNet的结构，网络的中的每个Encoder和Decoder模块也是类似UNet的结构，也就是在大的UNet中嵌入了一堆小UNet，所以作者给网络取名为U2Net。其实正确的名称是 $U^{2}-Net$ ，但是打平方符号太麻烦了，所以直接简写成U2Net。

通过上图可以看出，En_1、En_2、En_3、En_4、De_1、De_2、De_3、De_4采用的是同一种Block，只不过深度不同。该Block就是论文中提出的ReSidual U-block简称RSU。详情可见下图,下图展示的是RSU-7结构，其中7代表深度，注意最下面的3x3卷积采用的是膨胀卷积，膨胀因子为2。

下图是重绘的RSU-7结构，图中标出了每个输出特征图的shape方便大家进一步理解。

弄清楚RSU结构后，再回过头看U2Net结构。其中En_1和De_1采用的是RSU-7，En_2和De_2采用的是RSU-6，En_3和De_3采用的是RSU-5，En_4和De_4采用的是RSU-4，最后还剩下En_5、En_6和De_5三个模块。这三个模块采用的是RSU-4F，注意RSU-4F和RSU-4两者结构并不相同。在RSU-4F中并没有进行下采样或上采样，而是将采样层全部替换成了膨胀卷积。作者在论文3.2章节中的解释是到En_5时，特征图的分辨率已经很低了，如果接着下采样会丢失很多上下文信息，所以在RSU-4F中就不再进行下采样了。下图是我绘制的RSU-4F，其中带参数d的卷积层全部是膨胀卷积，d为膨胀系数。

接着再来看下saliency map fusion module即显著特征融合模块，通过该模块将不同尺度的saliency map进行融合并得到最终预测概率图。如下图所示，首先收集De_1、De_2、De_3、De_4、De_5以及En_6的输出，然后分别通过一个3x3的卷积层得到channel为1的特征图，接着通过双线性插值缩放到输入图片大小得到Sup1、Sup2、Sup3、Sup4、Sup5和Sup6，然后将这6个特征图进行Concat拼接。最后通过一个1x1的卷积层以及Sigmiod激活函数得到最终的预测概率图。

3.2 损失函数

在U2Net中损失计算公式如下所示：

该损失函数可以看成两部分，一部分是上述提到的Sup1、Sup2、Sup3、Sup4、Sup5和Sup6与GT之间的损失(注意，在计算损失前需要将Sup1、Sup2、Sup3、Sup4、Sup5和Sup6通过Sigmoid激活函数得到对应的概率图)，即 $\sum_{m=1}^{M}w_{side}^{(m)}l_{side}^{m}$ ,另一部分是最终融合得到的概率图与GT之间的损失，即 $w_{f_{use}} l_{f_{use}}$ 。其中l ll是二值交叉熵损失(binary cross-entropy loss)，w ww是各损失之间的平衡系数，在源码中w ww全部等于1，M等于6即Sup1至Sup6。

3.3代码实现

from typing import Union, List
import torch
import torch.nn as nn
import torch.nn.functional as Fclass ConvBNReLU(nn.Module):def __init__(self, in_ch: int, out_ch: int, kernel_size: int = 3, dilation: int = 1):super().__init__()padding = kernel_size // 2 if dilation == 1 else dilationself.conv = nn.Conv2d(in_ch, out_ch, kernel_size, padding=padding, dilation=dilation, bias=False)self.bn = nn.BatchNorm2d(out_ch)self.relu = nn.ReLU(inplace=True)def forward(self, x: torch.Tensor) -> torch.Tensor:return self.relu(self.bn(self.conv(x)))class DownConvBNReLU(ConvBNReLU):def __init__(self, in_ch: int, out_ch: int, kernel_size: int = 3, dilation: int = 1, flag: bool = True):super().__init__(in_ch, out_ch, kernel_size, dilation)self.down_flag = flagdef forward(self, x: torch.Tensor) -> torch.Tensor:if self.down_flag:x = F.max_pool2d(x, kernel_size=2, stride=2, ceil_mode=True)return self.relu(self.bn(self.conv(x)))class UpConvBNReLU(ConvBNReLU):def __init__(self, in_ch: int, out_ch: int, kernel_size: int = 3, dilation: int = 1, flag: bool = True):super().__init__(in_ch, out_ch, kernel_size, dilation)self.up_flag = flagdef forward(self, x1: torch.Tensor, x2: torch.Tensor) -> torch.Tensor:if self.up_flag:x1 = F.interpolate(x1, size=x2.shape[2:], mode='bilinear', align_corners=False)return self.relu(self.bn(self.conv(torch.cat([x1, x2], dim=1))))class RSU(nn.Module):def __init__(self, height: int, in_ch: int, mid_ch: int, out_ch: int):super().__init__()assert height >= 2self.conv_in = ConvBNReLU(in_ch, out_ch)encode_list = [DownConvBNReLU(out_ch, mid_ch, flag=False)]decode_list = [UpConvBNReLU(mid_ch * 2, mid_ch, flag=False)]for i in range(height - 2):encode_list.append(DownConvBNReLU(mid_ch, mid_ch))decode_list.append(UpConvBNReLU(mid_ch * 2, mid_ch if i < height - 3 else out_ch))encode_list.append(ConvBNReLU(mid_ch, mid_ch, dilation=2))self.encode_modules = nn.ModuleList(encode_list)self.decode_modules = nn.ModuleList(decode_list)def forward(self, x: torch.Tensor) -> torch.Tensor:x_in = self.conv_in(x)x = x_inencode_outputs = []for m in self.encode_modules:x = m(x)encode_outputs.append(x)x = encode_outputs.pop()for m in self.decode_modules:x2 = encode_outputs.pop()x = m(x, x2)return x + x_inclass RSU4F(nn.Module):def __init__(self, in_ch: int, mid_ch: int, out_ch: int):super().__init__()self.conv_in = ConvBNReLU(in_ch, out_ch)self.encode_modules = nn.ModuleList([ConvBNReLU(out_ch, mid_ch),ConvBNReLU(mid_ch, mid_ch, dilation=2),ConvBNReLU(mid_ch, mid_ch, dilation=4),ConvBNReLU(mid_ch, mid_ch, dilation=8)])self.decode_modules = nn.ModuleList([ConvBNReLU(mid_ch * 2, mid_ch, dilation=4),ConvBNReLU(mid_ch * 2, mid_ch, dilation=2),ConvBNReLU(mid_ch * 2, out_ch)])def forward(self, x: torch.Tensor) -> torch.Tensor:x_in = self.conv_in(x)x = x_inencode_outputs = []for m in self.encode_modules:x = m(x)encode_outputs.append(x)x = encode_outputs.pop()for m in self.decode_modules:x2 = encode_outputs.pop()x = m(torch.cat([x, x2], dim=1))return x + x_inclass U2Net(nn.Module):def __init__(self, cfg: dict, out_ch: int = 1):super().__init__()assert "encode" in cfgassert "decode" in cfgself.encode_num = len(cfg["encode"])encode_list = []side_list = []for c in cfg["encode"]:# c: [height, in_ch, mid_ch, out_ch, RSU4F, side]assert len(c) == 6encode_list.append(RSU(*c[:4]) if c[4] is False else RSU4F(*c[1:4]))if c[5] is True:side_list.append(nn.Conv2d(c[3], out_ch, kernel_size=3, padding=1))self.encode_modules = nn.ModuleList(encode_list)decode_list = []for c in cfg["decode"]:# c: [height, in_ch, mid_ch, out_ch, RSU4F, side]assert len(c) == 6decode_list.append(RSU(*c[:4]) if c[4] is False else RSU4F(*c[1:4]))if c[5] is True:side_list.append(nn.Conv2d(c[3], out_ch, kernel_size=3, padding=1))self.decode_modules = nn.ModuleList(decode_list)self.side_modules = nn.ModuleList(side_list)self.out_conv = nn.Conv2d(self.encode_num * out_ch, out_ch, kernel_size=1)def forward(self, x: torch.Tensor) -> Union[torch.Tensor, List[torch.Tensor]]:_, _, h, w = x.shape# collect encode outputsencode_outputs = []for i, m in enumerate(self.encode_modules):x = m(x)encode_outputs.append(x)if i != self.encode_num - 1:x = F.max_pool2d(x, kernel_size=2, stride=2, ceil_mode=True)# collect decode outputsx = encode_outputs.pop()decode_outputs = [x]for m in self.decode_modules:x2 = encode_outputs.pop()x = F.interpolate(x, size=x2.shape[2:], mode='bilinear', align_corners=False)x = m(torch.concat([x, x2], dim=1))decode_outputs.insert(0, x)# collect side outputsside_outputs = []for m in self.side_modules:x = decode_outputs.pop()x = F.interpolate(m(x), size=[h, w], mode='bilinear', align_corners=False)side_outputs.insert(0, x)x = self.out_conv(torch.concat(side_outputs, dim=1))if self.training:# do not use torch.sigmoid for amp safereturn [x] + side_outputselse:return torch.sigmoid(x)def u2net_full(out_ch: int = 1):cfg = {# height, in_ch, mid_ch, out_ch, RSU4F, side"encode": [[7, 3, 32, 64, False, False],      # En1[6, 64, 32, 128, False, False],    # En2[5, 128, 64, 256, False, False],   # En3[4, 256, 128, 512, False, False],  # En4[4, 512, 256, 512, True, False],   # En5[4, 512, 256, 512, True, True]],   # En6# height, in_ch, mid_ch, out_ch, RSU4F, side"decode": [[4, 1024, 256, 512, True, True],   # De5[4, 1024, 128, 256, False, True],  # De4[5, 512, 64, 128, False, True],    # De3[6, 256, 32, 64, False, True],     # De2[7, 128, 16, 64, False, True]]     # De1}return U2Net(cfg, out_ch)def u2net_lite(out_ch: int = 1):cfg = {# height, in_ch, mid_ch, out_ch, RSU4F, side"encode": [[7, 3, 16, 64, False, False],  # En1[6, 64, 16, 64, False, False],  # En2[5, 64, 16, 64, False, False],  # En3[4, 64, 16, 64, False, False],  # En4[4, 64, 16, 64, True, False],  # En5[4, 64, 16, 64, True, True]],  # En6# height, in_ch, mid_ch, out_ch, RSU4F, side"decode": [[4, 128, 16, 64, True, True],  # De5[4, 128, 16, 64, False, True],  # De4[5, 128, 16, 64, False, True],  # De3[6, 128, 16, 64, False, True],  # De2[7, 128, 16, 64, False, True]]  # De1}return U2Net(cfg, out_ch)def convert_onnx(m, save_path):m.eval()x = torch.rand(1, 3, 288, 288, requires_grad=True)# export the modeltorch.onnx.export(m,  # model being runx,  # model input (or a tuple for multiple inputs)save_path,  # where to save the model (can be a file or file-like object)export_params=True,opset_version=11)if __name__ == '__main__':# n_m = RSU(height=7, in_ch=3, mid_ch=12, out_ch=3)# convert_onnx(n_m, "RSU7.onnx")## n_m = RSU4F(in_ch=3, mid_ch=12, out_ch=3)# convert_onnx(n_m, "RSU4F.onnx")u2net = u2net_full()convert_onnx(u2net, "u2net_full.onnx")

参考：

U2Net网络简介

B站霹雳吧啦

这篇关于语义分割 U2net网络学习笔记（附代码）的文章就介绍到这儿，希望我们推荐的文章对编程师们有所帮助！