本文主要是介绍Spatial Transformer Networks(STN)代码分析,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
这是比较早的关于 attention的 文章了。
早且作用大,效果也不错。
关于这篇文章的解读有很多,一找一大堆,就不再赘述。
首先看看文章的解读,看懂原理,然后找到代码,对着看看,明白之后就自己会改了,就可以用到自己需要的地方了。
例如,文章解说和代码可参考:
一个文章解说地址
一个code地址
简单来说,就是在分类之前,先将原图作用于一个变换矩阵得到新的图,再去分类。
所以核心就是
1、得到变换矩阵,一个2*3的矩阵,可以实现平移缩放旋转裁剪等操作。
2、通过变换矩阵得到射变换前后的坐标的映射关系,即grid。
2、原图作用于grid之后得到新图,再卷积输出分类。
一个使用代码如下:
class STNSVHNet(nn.Module):def __init__(self, spatial_dim,in_channels, stn_kernel_size, kernel_size, num_classes=10, use_dropout=False):super(STNSVHNet, self).__init__()self._in_ch = in_channels self._ksize = kernel_size self._sksize = stn_kernel_sizeself.ncls = num_classes self.dropout = use_dropout self.drop_prob = 0.5self.stride = 1 self.spatial_dim = spatial_dimself.stnmod = STNModule.SpatialTransformer(self._in_ch, self.spatial_dim, self._sksize)self.conv1 = nn.Conv2d(self._in_ch, 32, kernel_size=self._ksize, stride=self.stride, padding=1, bias=False)self.conv2 = nn.Conv2d(32, 64, kernel_size=self._ksize, stride=1, padding=1, bias=False)self.conv3 = nn.Conv2d(64, 128, kernel_size=self._ksize, stride=1, padding=1, bias=False)self.fc1 = nn.Linear(128*4*4, 3092)self.fc2 = nn.Linear(3092, self.ncls)def forward(self, x):rois, affine_grid = self.stnmod(x)out = F.relu(self.conv1(rois))out = F.max_pool2d(out, 2)out = F.relu(self.conv2(out))out = F.max_pool2d(out, 2)out = F.relu(self.conv3(out))out = out.view(-1, 128*4*4)if self.dropout:out = F.dropout(self.fc1(out), p=0.5)else:out = self.fc1(out)out = self.fc2(out)return out
被调用的STN代如下:
class SpatialTransformer(nn.Module):"""Implements a spatial transformer as proposed in the Jaderberg paper. Comprises of 3 parts:1. Localization Net2. A grid generator 3. A roi pooled module.The current implementation uses a very small convolutional net with 2 convolutional layers and 2 fully connected layers. Backends can be swapped in favor of VGG, ResNets etc. TTMVReturns:A roi feature map with the same input spatial dimension as the input feature map. """def __init__(self, in_channels, spatial_dims, kernel_size,use_dropout=False):super(SpatialTransformer, self).__init__()self._h, self._w = spatial_dims self._in_ch = in_channels self._ksize = kernel_sizeself.dropout = use_dropout# localization net self.conv1 = nn.Conv2d(in_channels, 32, kernel_size=self._ksize, stride=1, padding=1, bias=False) # size : [1x3x32x32]self.conv2 = nn.Conv2d(32, 32, kernel_size=self._ksize, stride=1, padding=1, bias=False)self.conv3 = nn.Conv2d(32, 32, kernel_size=self._ksize, stride=1, padding=1, bias=False)self.conv4 = nn.Conv2d(32, 32, kernel_size=self._ksize, stride=1, padding=1, bias=False)self.fc1 = nn.Linear(32*4*4, 1024)self.fc2 = nn.Linear(1024, 6)def forward(self, x): """Forward pass of the STN module. x -> input feature map """batch_images = xx = F.relu(self.conv1(x.detach()))x = F.relu(self.conv2(x))x = F.max_pool2d(x, 2)x = F.relu(self.conv3(x))x = F.max_pool2d(x,2)x = F.relu(self.conv3(x))x = F.max_pool2d(x, 2)print("Pre view size:{}".format(x.size()))x = x.view(-1, 32*4*4)if self.dropout:x = F.dropout(self.fc1(x), p=0.5)x = F.dropout(self.fc2(x), p=0.5)else:x = self.fc1(x)x = self.fc2(x) # params [Nx6]x = x.view(-1, 2,3) # change it to the 2x3 matrix print(x.size())affine_grid_points = F.affine_grid(x, torch.Size((x.size(0), self._in_ch, self._h, self._w)))assert(affine_grid_points.size(0) == batch_images.size(0)), "The batch sizes of the input images must be same as the generated grid."rois = F.grid_sample(batch_images, affine_grid_points)print("rois found to be of size:{}".format(rois.size()))return rois, affine_grid_points
核心代码就两句
affine_grid_points = F.affine_grid(x, torch.Size((x.size(0), self._in_ch, self._h, self._w)))
rois = F.grid_sample(batch_images, affine_grid_points)
可以参考这个理解一下:
Pytorch中的仿射变换(affine_grid)
- batch_images:是原图
- X:是2*3的变换矩阵,是原图经过一系列卷积等网络结构得到。
- X后面的参数:表示在仿射变换中的输出的shape,其格式 [N, C, H, W],这里使得输出的size大小维度和原图一致。
- F.affine_grid:即affine_grid_points 是得到仿射变换前后的坐标的映射关系。返回Shape为 [N, H, W, 2] 的4-D Tensor,表示其中,N、H、W分别为仿射变换中输出feature map的batch size、高和宽。
- grid_sample:就是将映射关系作用于原图,得到新的图,再将新图进行卷积等操作,输出即可。
因为是有监督学习,所以X会自己学习得到。后面就都有了。
这篇关于Spatial Transformer Networks(STN)代码分析的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!