现在Computer Vision基本要用的几个图像特征和方法

本文主要是介绍现在Computer Vision基本要用的几个图像特征和方法，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

http://www.zhizhihu.com/html/y2010/2431.html

现在Computer Vision基本要用的几个图像特征和方法

2010年12月12日 ⁄ 技术, 科研, 酷图 ⁄ 共 2118字 ⁄ 评论数 10 ⁄ 被围观 4,116 阅读+

一直在关注Action Classification，VOC2010结果发布之后，大体看了一下，基本上就那些图像特征的使用（dense SIFT+Spatial Pyramid），然后就是乱七八糟的融合了，归结都低就是Multiple Kernel Learning以及一些近似的算法。

下面看看VOC2010关于ActionClassification部分的结果：

Average Precision (AP %)

	phoning	playing instrument	reading	riding bike	riding horse	running	taking photo	using computer	walking
BONN_ACTION	47.5	51.1	31.9	64.5	69.1	78.5	32.4	53.9	61.1
CVC_BASE	56.2	56.5	34.7	75.1	83.6	86.5	25.4	60.0	69.2
CVC_SEL	49.8	52.8	34.3	74.2	85.5	85.1	24.9	64.1	72.5
INRIA_SPM_HT	53.2	53.6	30.2	78.2	88.4	84.6	30.4	60.9	61.8
NUDT_SVM_WHGO_SIFT_CENTRIST_LLM	47.2	47.9	24.5	74.2	81.0	79.5	24.9	58.6	71.5
SURREY_MK_KDA	52.6	53.5	35.9	81.0	89.3	86.5	32.8	59.2	68.6
UCLEAR_SVM_DOSP_MULTFEATS	47.0	57.8	26.9	78.8	89.7	87.3	32.5	60.0	70.1
UMCO_DHOG_KSVM	53.5	43.0	32.0	67.9	68.8	83.0	34.1	45.9	60.4
WILLOW_A_SVMSIFT_1-A_LSVM	49.2	37.7	22.2	73.2	77.1	81.7	24.3	53.7	56.9
WILLOW_LSVM	40.4	29.9	32.2	53.5	62.2	73.6	17.6	45.8	41.5
WILLOW_SVMSIFT	47.9	29.1	21.7	53.5	76.7	78.3	26.0	42.9	56.4

各种方法的描述后面也有。

首先看看UCLEAR_SVM_DOSP_MULTFEATS的方法：

Multiple chi squared kernels are computed: spatial pyramid (SP) w/ dense SIFT, dense overlapping SP w/ HOG, texture filter, LAB values (bag-of-words w/ the above features) and edge dir hists. They are computed on full images, person bounding boxes (BB) and BB of the lower part (simple stretch-scale of person BB) expected to contain horse, bike etc. They are combined with class specific binary weights based on their perf on val set. Finally, class specific SVMs trained on train+val.

是不是感觉方法很简单？

再看看SURREY_MK_KDA的方法：

Kernel-level fusion with Spatial Pyramid Grids, Soft Assignment and Kernel Discriminant Analysis using spectral regression. 18 kernels have been generated from 18 variants of SIFT. 融合吧。

CVC_SEL的方法：

Enhanced CVC submission built upon CVC-BASE for action recognition. Standard BoW model over multiple features from CVC-BASE plus contextual object descriptors. Cross-validation procedure for action-specific feature and kernel selection. Foreground/background/neighborhood modeled separately, spatial pyramid over several features for foreground representation. Object detection based on deformable part-based detector incorporated. Late fusion of feature-specific SVM outputs for final action score.

综上所述：Spatial Pyramid w/（dense SIFT | overlap HOG）这是最好用的描述模板的方法，一起用就用Multiple Kernel融合起来，学个融合的参数，其实效果真的很好很好，不骗你。

所以说，对于一些类似这样的问题，除非你是非得自己发明一些描述子，不然用这些就能够达到一些实验的目标，当然实用也是未尝不可的。