Dite-HRNet: Dynamic Lightweight High-Resolution Network for Human PoseEstimation

本文主要是介绍Dite-HRNet: Dynamic Lightweight High-Resolution Network for Human PoseEstimation,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

阅读此篇文章的感触:

首先针对ACM提出两种类型:DCM和GCM

1. 首先介绍ACM的组成:

① 提出------adaptive context pooling, 包括一个1*1操作和softmax,以及transpose操作

②context shifting, 就是将context pooling特征图谱经过两个 1 × 1 convolutions with non-linear activation(1*1+BN+ReLu+1*1+BN+ReLu)和sigmoid函数

③ 将求得的shift通道权重与输入特征图谱进行像素相乘。

以上就是ACM的答题过程,针对DCM和GCM两种模块ACM的第一个步骤是不一样的。

2.DCM 的介绍

首先它是针对不同分支的不同分辨率进行聚合的,因此在ACM的第一步adaptive context pooling会有所不同。DCM的操作过程是将所有输入尺寸除以最小分辨率的尺寸,即H\times W/(H_{min}\times W_{min}),然后在合并(concat)

具体代码如下

class DenseContextModeling(nn.Module):def __init__(self, channels, reduction):super().__init__()num_branches = len(channels)self.reduction = reduction[num_branches-2]self.channels = channelstotal_channel = sum(channels)mid_channels = total_channel // self.reduction##这个是ACM中的adaptive context pooling, a context mask,一个1*1卷积和softmax部分 self.conv_mask = nn.ModuleList([nn.Conv2d(channels[i], 1, kernel_size=1, stride=1, padding=0, bias=True)for i in range(len(channels))])self.softmax = nn.Softmax(dim=2)##这个是shift操作——2个1*1卷积和一个sigmoid函数self.channel_attention = nn.Sequential(nn.Conv2d(total_channel, mid_channels, kernel_size=1, stride=1, padding=0, bias=False),nn.BatchNorm2d(mid_channels),nn.ReLU(inplace=True),nn.Conv2d(mid_channels, total_channel, kernel_size=1, stride=1, padding=0, bias=True),nn.Sigmoid())##这个就是真正的ACM实现def global_spatial_pool(self, x, mini_size, i):batch, channel, height, width = x.size()mini_height, mini_width = mini_size# [N, C, H, W]x_m = x# [N, C, H * W]x_m = x_m.view(batch, channel, height * width)# [N, MH * MW, C, (H * W) / (MH * MW)]x_m = x_m.view(batch, mini_height * mini_width, channel, (height * width) // (mini_height * mini_width))# [N, 1, H, W]mask = self.conv_mask[i](x)# [N, 1, H * W]mask = mask.view(batch, 1, height * width)# [N, 1, H * W]mask = self.softmax(mask)# [N, MH * MW, (H * W) / (MH * MW)]mask = mask.view(batch, mini_height * mini_width, (height * width) // (mini_height * mini_width))# [N, MH * MW, (H * W) / (MH * MW), 1]mask = mask.unsqueeze(-1)# [N, MH * MW, C, 1]x = torch.matmul(x_m, mask)# [N, C, MH * MW, 1]x = x.permute(0, 2, 1, 3)# [N, C, MH, MW]x = x.view(batch, channel, mini_height, mini_width)return xdef forward(self, x):mini_size = x[-1].size()[-2:]out = [self.global_spatial_pool(s, mini_size, i) for s, i in zip(x[:-1], range(len(x)))] + [x[-1]]out = torch.cat(out, dim=1)out = self.channel_attention(out)out = torch.split(out, self.channels, dim=1)out = [s * F.interpolate(a, size=s.size()[-2:], mode='nearest') for s, a in zip(x, out)]return out

2.GCM 

代码

class GlobalContextModeling(nn.Module):def __init__(self, channels, num_branch, reduction, with_cp=False):super().__init__()self.with_cp = with_cpself.reduction = reduction[num_branch]mid_channels = channels // self.reductionself.conv_mask = nn.Conv2d(channels, 1, kernel_size=1, stride=1, padding=0, bias=True)self.softmax = nn.Softmax(dim=2)self.channel_attention = nn.Sequential(nn.Conv2d(channels, mid_channels, kernel_size=1, stride=1, padding=0, bias=False),nn.BatchNorm2d(mid_channels),nn.ReLU(inplace=True),nn.Conv2d(mid_channels, channels, kernel_size=1, stride=1, padding=0, bias=True),nn.Sigmoid())self.bn = nn.BatchNorm2d(channels)def global_spatial_pool(self, x):batch, channel, height, width = x.size()# [N, C, H, W]x_m = x# [N, C, H * W]x_m = x_m.view(batch, channel, height * width)# [N, 1, C, H * W]x_m = x_m.unsqueeze(1)# [N, 1, H, W]mask = self.conv_mask(x)# [N, 1, H * W]mask = mask.view(batch, 1, height * width)# [N, 1, H * W]mask = self.softmax(mask)# [N, 1, H * W, 1]mask = mask.unsqueeze(-1)# [N, 1, C, 1]x = torch.matmul(x_m, mask)# [N, C, 1, 1]x = x.permute(0, 2, 1, 3)return xdef forward(self, x):def _inner_forward(x):identity = xx = self.global_spatial_pool(x)x = self.channel_attention(x)x = self.bn(identity * x)return xif self.with_cp and x.requires_grad:x = cp.checkpoint(_inner_forward, x)else:x = _inner_forward(x)return x

DSC代码

class DynamicSplitConvolution(nn.Module):def __init__(self, channels, stride, num_branch, num_groups, num_kernels, with_cp=False):super().__init__()self.with_cp = with_cpself.num_groups = num_groups[num_branch]self.num_kernels = num_kernels[num_branch]self.split_channels = _split_channels(channels, self.num_groups)self.conv = nn.ModuleList([ConvBN(self.split_channels[i],self.split_channels[i],kernel_size=i * 2 + 3,stride=stride,padding=i + 1,groups=self.split_channels[i],num_kernels=self.num_kernels)for i in range(self.num_groups)])def forward(self, x):def _inner_forward(x):if self.num_groups == 1:x = self.conv[0](x)else:x_split = torch.split(x, self.split_channels, dim=1)x = [conv(t) for conv, t in zip(self.conv, x_split)]x = torch.cat(x, dim=1)x = channel_shuffle(x, self.num_groups)return xif self.with_cp and x.requires_grad:x = cp.checkpoint(_inner_forward, x)else:x = _inner_forward(x)return x

在这里所有的卷积和采用的是动态卷积生成卷积核

动态卷积采用的是每一个输入特征图谱都有K个不同的卷积核(卷积核的大小一样,不一样的是参数值),如何生成K个不同的卷积核,采用pytorch里面的F.conv2函数自己定义。首先是生成batchsize个K个卷积核,采用的是SENet函数。然后是对F.conv2函数进行研究发现维度和现有想法不一致,如何实现每一个输入尺寸特征图谱有K个不同卷积核参数,现将输入维度变成batchsize*inplane,然后group=batchsize,这样,每一个分支代表一个特征图谱上的输入channel个数,且每一个特征图谱具有K个不同卷积核参数。

class KernelAttention(nn.Module):def __init__(self, channels, reduction=4, num_kernels=4, init_weight=True):super().__init__()if channels != 3:mid_channels = channels // reductionelse:mid_channels = num_kernelsself.avg_pool = nn.AdaptiveAvgPool2d(1)self.conv1 = nn.Conv2d(channels, mid_channels, kernel_size=1, bias=False)self.bn = nn.BatchNorm2d(mid_channels)self.relu = nn.ReLU(inplace=True)self.conv2 = nn.Conv2d(mid_channels, num_kernels, kernel_size=1, bias=True)self.sigmoid = nn.Sigmoid()if init_weight:self._initialize_weights()def _initialize_weights(self):for m in self.modules():if isinstance(m, nn.Conv2d):nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')if m.bias is not None:nn.init.constant_(m.bias, 0)if isinstance(m, nn.BatchNorm2d):nn.init.constant_(m.weight, 1)nn.init.constant_(m.bias, 0)def forward(self, x):x = self.avg_pool(x)x = self.conv1(x)x = self.bn(x)x = self.relu(x)x = self.conv2(x).view(x.shape[0], -1)x = self.sigmoid(x)return xclass KernelAggregation(nn.Module):def __init__(self, in_channels, out_channels, kernel_size, stride, padding, dilation, groups, bias, num_kernels,init_weight=True):super().__init__()self.in_channels = in_channelsself.out_channels = out_channelsself.kernel_size = kernel_sizeself.stride = strideself.padding = paddingself.dilation = dilationself.groups = groupsself.bias = biasself.num_kernels = num_kernelsself.weight = nn.Parameter(torch.randn(num_kernels, out_channels, in_channels // groups, kernel_size, kernel_size),requires_grad=True)if bias:self.bias = nn.Parameter(torch.zeros(num_kernels, out_channels))else:self.bias = Noneif init_weight:self._initialize_weights()def _initialize_weights(self):for i in range(self.num_kernels):nn.init.kaiming_uniform_(self.weight[i])def forward(self, x, attention):batch_size, in_channels, height, width = x.size()x = x.contiguous().view(1, batch_size * self.in_channels, height, width)weight = self.weight.contiguous().view(self.num_kernels, -1)weight = torch.mm(attention, weight).contiguous().view(batch_size * self.out_channels,self.in_channels // self.groups,self.kernel_size,self.kernel_size)if self.bias is not None:bias = torch.mm(attention, self.bias).contiguous().view(-1)x = F.conv2d(x,weight=weight,bias=bias,stride=self.stride,padding=self.padding,dilation=self.dilation,groups=self.groups * batch_size)else:x = F.conv2d(x,weight=weight,bias=None,stride=self.stride,padding=self.padding,dilation=self.dilation,groups=self.groups * batch_size)x = x.contiguous().view(batch_size, self.out_channels, x.shape[-2], x.shape[-1])return xclass DynamicKernelAggregation(nn.Module):def __init__(self, in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True,num_kernels=4):super().__init__()assert in_channels % groups == 0self.attention = KernelAttention(in_channels,num_kernels=num_kernels)self.aggregation = KernelAggregation(in_channels,out_channels,kernel_size=kernel_size,stride=stride,padding=padding,dilation=dilation,groups=groups,bias=bias,num_kernels=num_kernels)def forward(self, x):attention = xattention = self.attention(attention)x = self.aggregation(x, attention)return x

Enable GingerCannot connect to Ginger Check your internet connection
or reload the browserDisable in this text fieldRephraseRephrase current sentence7Log in to edit with Ginger×

这篇关于Dite-HRNet: Dynamic Lightweight High-Resolution Network for Human PoseEstimation的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/547133

相关文章

poj 2349 Arctic Network uva 10369(prim or kruscal最小生成树)

题目很麻烦,因为不熟悉最小生成树的算法调试了好久。 感觉网上的题目解释都没说得很清楚,不适合新手。自己写一个。 题意:给你点的坐标,然后两点间可以有两种方式来通信:第一种是卫星通信,第二种是无线电通信。 卫星通信:任何两个有卫星频道的点间都可以直接建立连接,与点间的距离无关; 无线电通信:两个点之间的距离不能超过D,无线电收发器的功率越大,D越大,越昂贵。 计算无线电收发器D

MonoHuman: Animatable Human Neural Field from Monocular Video 翻译

MonoHuman:来自单目视频的可动画人类神经场 摘要。利用自由视图控制来动画化虚拟化身对于诸如虚拟现实和数字娱乐之类的各种应用来说是至关重要的。已有的研究试图利用神经辐射场(NeRF)的表征能力从单目视频中重建人体。最近的工作提出将变形网络移植到NeRF中,以进一步模拟人类神经场的动力学,从而动画化逼真的人类运动。然而,这种流水线要么依赖于姿态相关的表示,要么由于帧无关的优化而缺乏运动一致性

图神经网络框架DGL实现Graph Attention Network (GAT)笔记

参考列表: [1]深入理解图注意力机制 [2]DGL官方学习教程一 ——基础操作&消息传递 [3]Cora数据集介绍+python读取 一、DGL实现GAT分类机器学习论文 程序摘自[1],该程序实现了利用图神经网络框架——DGL,实现图注意网络(GAT)。应用demo为对机器学习论文数据集——Cora,对论文所属类别进行分类。(下图摘自[3]) 1. 程序 Ubuntu:18.04

论文精读-Supervised Raw Video Denoising with a Benchmark Dataset on Dynamic Scenes

论文精读-Supervised Raw Video Denoising with a Benchmark Dataset on Dynamic Scenes 优势 1、构建了一个用于监督原始视频去噪的基准数据集。为了多次捕捉瞬间,我们手动为对象s创建运动。在高ISO模式下捕获每一时刻的噪声帧,并通过对多个噪声帧进行平均得到相应的干净帧。 2、有效的原始视频去噪网络(RViDeNet),通过探

深度学习--对抗生成网络(GAN, Generative Adversarial Network)

对抗生成网络(GAN, Generative Adversarial Network)是一种深度学习模型,由Ian Goodfellow等人在2014年提出。GAN主要用于生成数据,通过两个神经网络相互对抗,来生成以假乱真的新数据。以下是对GAN的详细阐述,包括其概念、作用、核心要点、实现过程、代码实现和适用场景。 1. 概念 GAN由两个神经网络组成:生成器(Generator)和判别器(D

Neighborhood Homophily-based Graph Convolutional Network

#paper/ccfB 推荐指数: #paper/⭐ #pp/图结构学习 流程 重定义同配性指标: N H i k = ∣ N ( i , k , c m a x ) ∣ ∣ N ( i , k ) ∣ with c m a x = arg ⁡ max ⁡ c ∈ [ 1 , C ] ∣ N ( i , k , c ) ∣ NH_i^k=\frac{|\mathcal{N}(i,k,c_{

F12抓包05:Network接口测试(抓包篡改请求)

课程大纲         使用线上接口测试网站演示操作,浏览器F12检查工具如何进行简单的接口测试:抓包、复制请求、篡改数据、发送新请求。         测试地址:https://httpbin.org/forms/post ① 抓包:鼠标右键打开“检查”工具(F12),tab导航选择“网络”(Network),输入前3项点击提交,可看到录制的请求和返回数据。

OpenSNN推文:神经网络(Neural Network)相关论文最新推荐(九月份)(一)

基于卷积神经网络的活动识别分析系统及应用 论文链接:oalib简介:  活动识别技术在智能家居、运动评估和社交等领域得到广泛应用。本文设计了一种基于卷积神经网络的活动识别分析与应用系统,通过分析基于Android搭建的前端采所集的三向加速度传感器数据,对用户的当前活动进行识别。实验表明活动识别准确率满足了应用需求。本文基于识别的活动进行卡路里消耗计算,根据用户具体的活动、时间以及体重计算出相应活

【硬刚ES】ES基础(十三)Dynamic Template和Index Template

本文是对《【硬刚大数据之学习路线篇】从零到大数据专家的学习指南(全面升级版)》的ES部分补充。

Kafka【十一】数据一致性与高水位(HW :High Watermark)机制

【1】数据一致性 Kafka的设计目标是:高吞吐、高并发、高性能。为了做到以上三点,它必须设计成分布式的,多台机器可以同时提供读写,并且需要为数据的存储做冗余备份。 图中的主题有3个分区,每个分区有3个副本,这样数据可以冗余存储,提高了数据的可用性。并且3个副本有两种角色,Leader和Follower,Follower副本会同步Leader副本的数据。 一旦Leader副本挂了,Follo