Spatial Transformer Networks（STN）代码分析

本文主要是介绍Spatial Transformer Networks（STN）代码分析，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

这是比较早的关于 attention的文章了。

早且作用大，效果也不错。

关于这篇文章的解读有很多，一找一大堆，就不再赘述。

首先看看文章的解读，看懂原理，然后找到代码，对着看看，明白之后就自己会改了，就可以用到自己需要的地方了。

例如，文章解说和代码可参考：
一个文章解说地址
一个code地址

简单来说，就是在分类之前，先将原图作用于一个变换矩阵得到新的图，再去分类。

所以核心就是
1、得到变换矩阵，一个2*3的矩阵，可以实现平移缩放旋转裁剪等操作。
2、通过变换矩阵得到射变换前后的坐标的映射关系，即grid。
2、原图作用于grid之后得到新图，再卷积输出分类。

一个使用代码如下：


class STNSVHNet(nn.Module):def __init__(self, spatial_dim,in_channels, stn_kernel_size, kernel_size, num_classes=10, use_dropout=False):super(STNSVHNet, self).__init__()self._in_ch = in_channels self._ksize = kernel_size self._sksize = stn_kernel_sizeself.ncls = num_classes self.dropout = use_dropout self.drop_prob = 0.5self.stride = 1 self.spatial_dim = spatial_dimself.stnmod = STNModule.SpatialTransformer(self._in_ch, self.spatial_dim, self._sksize)self.conv1 = nn.Conv2d(self._in_ch, 32, kernel_size=self._ksize, stride=self.stride, padding=1, bias=False)self.conv2 = nn.Conv2d(32, 64, kernel_size=self._ksize, stride=1, padding=1, bias=False)self.conv3 = nn.Conv2d(64, 128, kernel_size=self._ksize, stride=1, padding=1, bias=False)self.fc1 = nn.Linear(128*4*4, 3092)self.fc2 = nn.Linear(3092, self.ncls)def forward(self, x):rois, affine_grid = self.stnmod(x)out = F.relu(self.conv1(rois))out = F.max_pool2d(out, 2)out = F.relu(self.conv2(out))out = F.max_pool2d(out, 2)out = F.relu(self.conv3(out))out = out.view(-1, 128*4*4)if self.dropout:out = F.dropout(self.fc1(out), p=0.5)else:out = self.fc1(out)out = self.fc2(out)return out

被调用的STN代如下：


class SpatialTransformer(nn.Module):"""Implements a spatial transformer as proposed in the Jaderberg paper. Comprises of 3 parts:1. Localization Net2. A grid generator 3. A roi pooled module.The current implementation uses a very small convolutional net with 2 convolutional layers and 2 fully connected layers. Backends can be swapped in favor of VGG, ResNets etc. TTMVReturns:A roi feature map with the same input spatial dimension as the input feature map. """def __init__(self, in_channels, spatial_dims, kernel_size,use_dropout=False):super(SpatialTransformer, self).__init__()self._h, self._w = spatial_dims self._in_ch = in_channels self._ksize = kernel_sizeself.dropout = use_dropout# localization net self.conv1 = nn.Conv2d(in_channels, 32, kernel_size=self._ksize, stride=1, padding=1, bias=False) # size : [1x3x32x32]self.conv2 = nn.Conv2d(32, 32, kernel_size=self._ksize, stride=1, padding=1, bias=False)self.conv3 = nn.Conv2d(32, 32, kernel_size=self._ksize, stride=1, padding=1, bias=False)self.conv4 = nn.Conv2d(32, 32, kernel_size=self._ksize, stride=1, padding=1, bias=False)self.fc1 = nn.Linear(32*4*4, 1024)self.fc2 = nn.Linear(1024, 6)def forward(self, x): """Forward pass of the STN module. x -> input feature map """batch_images = xx = F.relu(self.conv1(x.detach()))x = F.relu(self.conv2(x))x = F.max_pool2d(x, 2)x = F.relu(self.conv3(x))x = F.max_pool2d(x,2)x = F.relu(self.conv3(x))x = F.max_pool2d(x, 2)print("Pre view size:{}".format(x.size()))x = x.view(-1, 32*4*4)if self.dropout:x = F.dropout(self.fc1(x), p=0.5)x = F.dropout(self.fc2(x), p=0.5)else:x = self.fc1(x)x = self.fc2(x) # params [Nx6]x = x.view(-1, 2,3) # change it to the 2x3 matrix print(x.size())affine_grid_points = F.affine_grid(x, torch.Size((x.size(0), self._in_ch, self._h, self._w)))assert(affine_grid_points.size(0) == batch_images.size(0)), "The batch sizes of the input images must be same as the generated grid."rois = F.grid_sample(batch_images, affine_grid_points)print("rois found to be of size:{}".format(rois.size()))return rois, affine_grid_points

核心代码就两句

affine_grid_points = F.affine_grid(x, torch.Size((x.size(0), self._in_ch, self._h, self._w)))
rois = F.grid_sample(batch_images, affine_grid_points)

可以参考这个理解一下：
Pytorch中的仿射变换(affine_grid)

batch_images：是原图
X：是2*3的变换矩阵，是原图经过一系列卷积等网络结构得到。
X后面的参数：表示在仿射变换中的输出的shape，其格式 [N, C, H, W]，这里使得输出的size大小维度和原图一致。
F.affine_grid：即affine_grid_points 是得到仿射变换前后的坐标的映射关系。返回Shape为 [N, H, W, 2] 的4-D Tensor，表示其中，N、H、W分别为仿射变换中输出feature map的batch size、高和宽。
grid_sample：就是将映射关系作用于原图，得到新的图，再将新图进行卷积等操作，输出即可。

因为是有监督学习，所以X会自己学习得到。后面就都有了。

这篇关于Spatial Transformer Networks（STN）代码分析的文章就介绍到这儿，希望我们推荐的文章对编程师们有所帮助！