本文主要是介绍Learning to Fuse Asymmetric Feature Maps in Siamese Trackers论文解读,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
Learning to Fuse Asymmetric Feature Maps in Siamese Trackers
论文地址 代码地址,实际上根本没有
1. Introduction
SiamRPN特点:
SiamRPN formulates the tracking problem as one-shot detection. SiamRPN把追踪问题描述为小样本检测 SiamRPN introduces a region proposal network(RPN) and utilizes Upchannel cross correlation. SiamRPN引入一个区域推荐网络并利用Upchannel cross correlation UP-XCorr imbalences the parameter distribution, making the training optimization hard UP-XCorr参数分布不平衡,会让优化很困难
SiamRPN++特点
SiamRPN++ introduces Depthwise Correlation to efficiently generate a multi-channel correlation Feature-Map to address the imbalence of parameter distribution.
Limitation 1: DW-Corr produces similar correlation responses for the target and distractors of homogeneous appearance, which make it difficult for RPN to effectively discriminate the desired target from distractors DW-Corr会在target(模板)和distractors(search-region)同类的object之间生成一些响应很高的response map,这会让RPN很难有效地区分出需要的目标(因为有些假目标和真目标很像,而且响应图中的响应值也很高) Limitation 2: Only a few channels in the DW-Corr feature map are activated DW-Corr通道中只有很少一部分是激活的(有用的,无用特征冗余度过高) To perform DW-Corr,features of different targets are desired to be orthogonal and distributed in different channels,so that correlation feature channels of different targets are suppressed and only a few channels of the same target are activated 不同target(template)的特征需要是正交的,且是分布在不同通道中的,因此计算互信息用的两张feature-map之间不同target的互信息特征的通道会被抑制(响应值低),且只有少数的具有相同target的通道会是激活(响应值高)通道 DW-Corr often produces response at irrelevant background,as consequence, correlation maps are often blurry and do not have clear boundaries and hinder RPN from accurate and robust prediction DW-Corr常在无关背景处产生响应(响应值高),结果就是响应图会模糊且没有明显边界(燃在一起了)且阻碍RPN网络产生精确鲁邦的预测
2. Related Work
1. MDNet: MDNet tracker employs a CNN trained offline from multiple annotated videos. During eval, it learns a domain-specific detector online to discriminate between the background and foreground. MDNet追踪器采用多重标注的视频离线训练CNN。Eval阶段中,学习一个特定区域的在线检测器,以区分前景和背景 2. ATOM: ATOM comprises two dedicated components: target estimation, which is trained offline, and classification trained online ATOM由两个专用部分构成:离线训练的目标估计模块和在线训练的分类模块 3. DiMP: DiMP employs a meta-learning based architecture, trained offline, that predicts the weights of the target model DiMP采用元学习的离线训练架构,预测目标模型(ATOM)权重 4. KYS: KYS extends DiMP by exploiting scene information to improve the results KYS利用环境信息和帧信息(Spatial-Temporal information)拓展DiMP,提升结果 5. SiamFC: SiamFC firstly introduce XCorr layer to combine feature maps
3. Method
3.1 Siamese Networks for Tracking
Siamese networks formulate the tracking task as learning a general similarity map between the feature maps extracted from the target template and the search region. When certain sliding windows in the search region are similar to the template, responses in these windows are high.
c = f ( z ‾ , x ‾ ) = φ ( z ; θ ) ∗ φ ( x ; θ ) c=f(\overline{z},\overline{x})=\varphi(z;\theta) * \varphi(x;\theta) c=f(z,x)=φ(z;θ)∗φ(x;θ) 其中
φ 是 n e t w o r k \varphi是network φ是network
z ‾ = φ ( z ; θ ) ∈ R C × η × ω \overline{z}=\varphi(z;\theta)\in \mathbb{R}^{C\times \eta \times \omega} z=φ(z;θ)∈RC×η×ω
x ‾ = φ ( x ; θ ) ∈ R C × H × W \overline{x}=\varphi(x;\theta)\in \mathbb{R}^{C \times H \times W} x=φ(x;θ)∈RC×H×W
f 是结合特征图响应图和相似度响应图的函数 f是结合特征图响应图和相似度响应图的函数 f是结合特征图响应图和相似度响应图的函数
-
SiamRPN++ introduces DW-Corr addressing parameter distribution imbalences to efficiently generate a multi-channel correlation feature map
SiamRPN++引入Depth-wise描述参数分布不均,生成一个多通道的互相关响应图
c d w = f ( z ‾ , x ‾ ) = z ‾ ⊗ z ‾ c_{dw} = f(\overline{z},\overline{x})=\overline{z} \otimes \overline{z} cdw=f(z,x)=z⊗z
c d w ∈ R N × ( H − η + 1 ) × ( W − ω + 1 ) c_{dw}\hspace{2mm}\in \hspace{2mm} \mathbb{R}^{N \times (H-\eta+1)\times (W-\omega +1)} cdw∈RN×(H−η+1)×(W−ω+1)
⊗ 指的是 d e p t h − w i s e 两个特征图的 c o n v o l u t i o n 操作 \otimes \hspace{1mm}指的是depth-wise 两个特征图的 convolution操作 ⊗指的是depth−wise两个特征图的convolution操作
3.2 Asymmetric Convolution
To circumvent expensive computation, Author introduces a mathmetically equivalent procedure, called the Asymmetric Convolution, that replaces this direct convolution on concatenated feature map with two independent convolutions
为了规避较高的计算代价,作者引入了一个数学上和DW-Corr等价的过程,称之为AC,通过两个独立的卷积,然后进行Broadcast拼接
v i = [ θ z θ x ] ∗ [ z ‾ x ‾ i ] = θ z ∗ z ‾ + θ x ∗ x ‾ i v_i = \begin{bmatrix} \theta_z&\theta_x \end{bmatrix}*\begin{bmatrix} \overline{z} \\ \overline{x}_i \end{bmatrix} \\ \hspace{3mm}=\theta_z * \overline{z}+\theta_x*\overline{x}_i \\ vi=[θzθx]∗[zxi]=θz∗z+θx∗xi v = { v i ∣ i ∈ [ 1 , n ] } = { θ z ∗ z ‾ + b θ x ∗ x ‾ i ∣ i ∈ [ 1 , n ] } = θ z ∗ z ‾ + b θ x ∗ x ‾ v = \{v_i | i \in [1,n]\} \\ \hspace{34mm}=\{\theta_z*\overline{z} \hspace{2mm}+_b \hspace{2mm} \theta_x* \overline{x}_i \hspace{4mm} |\hspace{2mm} i \in [1, n] \} \\ \hspace{10mm}=\theta_z * \overline{z} \hspace{2mm} +_b \hspace{2mm} \theta_x * \overline{x} v={vi∣i∈[1,n]}={θz∗z+bθx∗xi∣i∈[1,n]}=θz∗z+bθx∗x 其中
x ∈ R H × W × C 是 s e a r c h 经过 b a c k b o n e 的 f e a t u r e − m a p x\in \mathbb{R}^{H\times W\times C}是search经过backbone的feature-map x∈RH×W×C是search经过backbone的feature−map
z ∈ R η × ω × c 是 t e m p l a t e 经过 b a c k b o n e 的 f e a t u r e − m a p z \in \mathbb{R}^{\eta \times \omega \times c}是template经过backbone的feature-map z∈Rη×ω×c是template经过backbone的feature−map
θ x ∗ x ‾ ∈ R ( H − η + 1 ) × ( W − ω + 1 ) × P 是 x f 经过 h e a d ( k e r n e l _ s i z e = z f 的 k e r n e l _ s i z e = [ η , ω ] ) 之后的 r e s p o n s e − m a p , 维度是 [ H − η + 1 , W − ω + 1 , P ] \theta_x * \overline{x} \in \mathbb{R}^{(H-\eta+1)\times(W-\omega +1)\times P}是x_f经过head(kernel\_size=z_f的kernel\_size=[\eta, \omega])之后的response-map,维度是[H-\eta+1,W-\omega +1,P] θx∗x∈R(H−η+1)×(W−ω+1)×P是xf经过head(kernel_size=zf的kernel_size=[η,ω])之后的response−map,维度是[H−η+1,W−ω+1,P]
θ z ∗ z ‾ ∈ R 1 × 1 × P 是 z f 经过 h e a d ( k e r n e l _ s i z e 跟自己相同 [ η , ω ] ) 之后的 r e s p o n s e ,维度是 [ 1 , 1 , P ] \theta_z * \overline{z} \in \mathbb{R}^{1\times1\times P}是z_f经过head(kernel\_size跟自己相同[\eta, \omega])之后的response,维度是[1, 1, P] θz∗z∈R1×1×P是zf经过head(kernel_size跟自己相同[η,ω])之后的response,维度是[1,1,P]
+ b 和 ⊕ 的含义类似,也是 b r o a d c a s t 后相加,就是把维度是 [ 1 , 1 , P ] 的 z ‾ , b r o a d c a s t 成和 x ‾ 一样的 [ H − η + 1 , W − ω + 1 , P ] ,这一步才是核心,就是把原本 D W − C o r r 的卷积变成了相加,计算代价自然而然就降低了。 +_b和\oplus的含义类似,也是broadcast后相加,就是把维度是[1, 1, P]的\overline{z},broadcast成和\overline{x}一样的[H-\eta+1, W-\omega+1, P],这一步才是核心,就是把原本DW-Corr的卷积变成了相加,计算代价自然而然就降低了。 +b和⊕的含义类似,也是broadcast后相加,就是把维度是[1,1,P]的z,broadcast成和x一样的[H−η+1,W−ω+1,P],这一步才是核心,就是把原本DW−Corr的卷积变成了相加,计算代价自然而然就降低了。
一、Benchmarking Large Language Models on Communicative Medical Coaching: A Dataset and a Novel System论文地址含代码 大多数现有模型和工具主要迎合以患者为中心的服务。这项工作深入探讨了LLMs在提高医疗专业人员的沟通能力。目标是构建一个模拟实践环境,人类医生(即医学学习者)可以在其中与患者代理进行医学