
2024-02-11 04:40
文章标签 mobiledets


MobileDets: Searching for Object Detection Architectures for Mobile Accelerators



IBN模块(inverted bottlenecks)在state-of-the-art 的mobile models 中作为主要构件模块。但是作者发现虽然IBN得益于depthwise-separable conv,可以很好的减少参数数量和flops. 但是在一些现代手机加速器(EdgeTPU accelerators and Qualcomm DSPs,)上很少被优化。


It is observed that for certain tensor shapes and kernel dimensions, a regular convolution can utilize the hardware up to 3× more efficiently than the depthwise variation on an EdgeTPU despite the much larger amount of theoretical computation cost (7× more FLOPs)。也就是说常规的卷积可以更好的利用移动端加速器。


因此作者提出一个新的搜索空间。 including IBNs and full convolution sequences motivated by the structure of Tensor decomposition [34,6], called TDB(Tensor-Decomposition-Based search space)


By learning to leverage full convolutions at selected positions in the network, our method outperforms IBN-only models by a significant margin,

  1. outperform MobileNetV2 by 1.9mAP on mobile CPU, 3.7mAP on EdgeTPU and 3.4mAP on DSP
  2. outperform the state-of-the-art MobileNetV3 classification backbone by 1.7mAP at similar CPU
  3. comparable performance with the state-of-the-art mobile CPU detector, MnasFPN, but 2x faster.


​ It employs TuNAS [1] for its scalability and its reliable improvement over random baselines.

​ TuNAS简要介绍:

A controller whose goal is to pick an architecture that optimize a platform-aware reward function.

The one-shot model and the controller are trained together during search.

  1. In each step, the controller samples a random architecture from a multinomial distribution that spans over the choices,
  2. then the portion of the one-shot model’s weights associated with the sampled architecture are updated,
  3. finally a reward is computed for the sampled architecture, which is used to update the controller

The update is given by applying standard REINFORCE algorithm [37] :

  • mAP(M) denotes the detection mAP of an architecture M,
  • c(M) is the inference cost (in this case, latency)


为了加快速度, estimate mAP(M) based on a small mini-batch for efficiency.


枚举所有网络来得到c(M)是不可能的。所以选择训练一个cost model - a liner regression model来估计模型的 inference cost .

The cost model’s features are composed of, for each layer, an indicator of the cross product between input/output channel sizes and layer type.


为了收集cost model的训练数据,我们从搜索空间中随机抽取数千个网络架构,并在设备上对每个架构进行基准测试。





【读点论文】MobileDets: Searching for Object Detection Architectures for Mobile Accelerators,适配不同硬件平台的搜索方案

MobileDets: Searching for Object Detection Architectures for Mobile Accelerators Abstract 建立在深度方向卷积上的反向瓶颈层已经成为移动设备上的最新对象检测模型中的主要构件。在这项工作中,本文通过重新考察常规卷积的有效性,研究了这种设计模式在各种移动加速器上的最优性。本文发现,常规卷积是一个有效的组件,可以