mantranet:manipulation tracing network for detection and localization of image forgeries

本文主要是介绍mantranet:manipulation tracing network for detection and localization of image forgeries，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

[论文笔记] 篡改检测：ManTra-Net - 知乎[论文笔记] ManTra-Net: Manipulation Tracing Network For Detection And Localization of Image Forgeries With Anomalous Features说在前面个人心得： Manipulation其实就是意味着对图像进行篡改或者说操作讲…https://zhuanlan.zhihu.com/p/266157561这篇文章是比较细的，对影响篡改检的的clue进行了比较细的讨论，clue其实和篡改类型还是相关的。写的其实挺好的，有两个点

1.篡改类型

JPEG compression,edge inconsistences,noise pattern,color consistency,visual similarity,exif consistency and camera model.

2.网络结构

mantranet核心是把篡改检测问题当成局部异常检测问题来做，设计一个z分数特征来捕获异常，并提出一种新的lstm来评估异常区域。

文中有关于这么设计的思路，首先特征提取层是一个vgg16，后面接一个local anomaly detection network，最后还是bce loss，算是标准的语义分割问题，不过也讨论了BayerConv2d是一种约束卷积层的方式，可以抑制图像痕迹的影响，自适应提取图像的篡改特征。SRMConv2D是隐写分析的滤波器，利用srm层提取的噪声特征来发现真实和篡改区域之间的噪声不一致。这些属于在结构上的优化设计，在后续的天池的篡改比赛中，在文档场景也有提出了这些clue设计的滤波器其实是效果不太好的，并不能针对特定场景有比较好的效果。其次，mvssnet中也说了把篡改检测问题单纯建模成语义分割问题是次优，需要有针对性的设计网络。

核心来讲一下local anomaly detection network，首先由一个并行的zpool2d模块，这块的设计思路是：1.什么是主要特征如何计算，2.如何量化局部特征和参考知己难得差异。

class Zpool2D_Window(nn.Module):def __init__(self, inputChannels, window_size_list, min_value=1e-5) -> None:super().__init__()self.min_value = min_valueself.window_size_list = window_size_listself.maxWinSize = np.max(window_size_list)self.TinyWeight  = nn.Parameter(torch.full([1,1,inputChannels,1,1], min_value, dtype=torch.float32) ,requires_grad= True)self.TinyWeight.data.clamp(min=0)def _init_padding_buff(self, x): # include Cumulate sumpaddingLayer = nn.ZeroPad2d(self.maxWinSize//2 + 1)x_pad = paddingLayer(x)x_cum = torch.cumsum(x_pad, 2)x_cum = torch.cumsum(x_cum, 3)return x_cumdef _compute_a_window_avg(self, x, winSize):# --left top Big square block-- coordinatetop = self.maxWinSize // 2 - winSize // 2bottom = top + winSizeleft = self.maxWinSize // 2 - winSize // 2right = left + winSize Ax, Ay = (left, top)Bx, By = (right, top)Cx, Cy = (right, bottom)Dx, Dy = (left, bottom)# negative number , but can be parse to a positve when using fomula like this -> [:-1]# --right bottom Big square block-- coordinatetop0 = -self.maxWinSize // 2 - winSize // 2 - 1bottom0 = top0 + winSize left0 = -self.maxWinSize // 2 - winSize // 2 - 1 right0 = left0 + winSizeAx0, Ay0 = (left0, top0)Bx0, By0 = (right0, top0)Cx0, Cy0 = (right0, bottom0)Dx0, Dy0 = (left0, bottom0)counts = torch.ones_like(x)# print(counts)counts_pading = self._init_padding_buff(counts)# print(counts_pading)x_padding = self._init_padding_buff(x)counts_2d = counts_pading[:,:,Ay:Ay0, Ax:Ax0] \+ counts_pading[:,:,Cy:Cy0, Cx:Cx0] \- counts_pading[:,:,By:By0, Bx:Bx0] \- counts_pading[:,:,Dy:Dy0, Dx:Dx0]sum_x_2d = x_padding[:,:,Ay:Ay0, Ax:Ax0] \+ x_padding[:,:,Cy:Cy0, Cx:Cx0] \- x_padding[:,:,By:By0, Bx:Bx0] \- x_padding[:,:,Dy:Dy0, Dx:Dx0]avg_x_2d = sum_x_2d / counts_2dreturn avg_x_2ddef forward(self, x):outputFeature = []# 1. windowfor win in self.window_size_list:avg_x_2d = self._compute_a_window_avg(x, win)D_x = x - avg_x_2doutputFeature.append(D_x)# 2. globalmu_f = torch.mean(x, dim=(2,3), keepdim=True)D_f = x - mu_foutputFeature.append(D_f)# 5 Dim Tensor arrange : (Batch, Diff_Windows, channel, width, height )outputFeature = torch.stack(outputFeature,1) std_x = torch.std(outputFeature, dim=(3,4),keepdim=True)std_x = torch.maximum(std_x, self.TinyWeight + self.min_value / 10.)x = torch.stack([x for i in range(len(self.window_size_list)+ 1) ], dim=1)Z_f = x / std_xreturn Z_f

zpool2d的世界遵循了一下的公式：