caffe - faster r-cnn（python）之路

本文主要是介绍caffe - faster r-cnn（python）之路，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

1. faster-rcnn安装与运行
　下列faster-rcnn的安装参考github作者给出的教程：https://github.com/rbgirshick/py-faster-rcnn

caffe的安装参考官网教程（ see：Caffe installation instructions）
note:将makefile.config中这两行注释去掉

WITH_PYTHON_LAYER := 1
USE_CUDNN := 1

将Faster R-CNN下载到本地

git clone --recursive https://github.com/rbgirshick/py-faster-rcnn.git

假设下载下来存放的路径根目录为：FRCN_ROOT
编译Cython模块

cd $FRCN_ROOT/lib
make

编译caffe和pycaffe

cd $FRCN_ROOT/caffe-fast-rcnn
make -j8 && make pycaffe

下载pre-computed Faster R-CNN detectors

cd $FRCN_ROOT
./data/scripts/fetch_faster_rcnn_models.sh

安装成功之后，运行demo.py测试下，可以试下自己的图片：

cd $FRCN_ROOT
./tools/demo.py

更多参考官方教程：https://github.com/rbgirshick/py-faster-rcnn

2. 文件夹导读

caffe-fast-rcnn：caffe框架目录
data：用来存放pretrained模型以及读取文件的cache缓存，还有一些下载模型的脚本
experiments:存放配置文件以及运行的log文件，另外这个目录下有scripts，里面存放end2end和alt_opt两种训练方式的脚本
lib：用来存放一些python接口文件，如其下的datasets主要负责数据库读取，config负责一些训练的配置选项
models：里面存放了三个模型文件，小型网络ZF，中型网络VGG_CNN_M_1024以及大型网络VGG16，根据你的硬件条件来选择使用哪种网络，ZF和VGG_CNN_M_1024需要至少3G内存，VGG16需要更多的内存，但不会超过11G
output：这里存放的是训练完成后的输出目录，这是运行了训练后才会出现的目录
tools：里面存放的是训练和测试的Python文件

3. 制作数据集
　 3.1.用标注工具labelImg

安装：sudo pip install labelImg 
运行：labelImg

　　这里写图片描述
　　可以open一张，也可以open dir导入一个文件。利用Create RectBox圈出目标区域，之后对区域进行类别标注。然后利用next image或者prev Image切换下一张或者前一张。标记错的可以直接点击后delete,….很简单，不再详细介绍。
　　标注之后保存后的形式和VOC中的Annotations文件夹中的格式一样。
　　

<annotation verified="no"><folder>images</folder><filename>00002</filename><path>/home/apple/work/py-faster-rcnn/images/00002.jpg</path><source><database>Unknown</database></source><size><width>500</width><height>375</height><depth>3</depth></size><segmented>0</segmented><object><name>dog</name><pose>Unspecified</pose><truncated>0</truncated><difficult>0</difficult><bndbox><xmin>2</xmin><ymin>2</ymin><xmax>264</xmax><ymax>372</ymax></bndbox></object><object><name>cat</name><pose>Unspecified</pose><truncated>1</truncated><difficult>0</difficult><bndbox><xmin>276</xmin><ymin>82</ymin><xmax>499</xmax><ymax>375</ymax></bndbox></object>
</annotation>

　　参考博客：http://blog.csdn.net/jesse_mx/article/details/53606897　　　　https://bealin.github.io/2016/10/23/Caffe%E5%AD%A6%E4%B9%A0%E7%B3%BB%E5%88%97%E2%80%94%E2%80%946%E4%BD%BF%E7%94%A8Faster-RCNN%E8%BF%9B%E8%A1%8C%E7%9B%AE%E6%A0%87%E6%A3%80%E6%B5%8B/

　2.2.使用自己的程序进行标记

　　目标：对图像中目标标注bounding box,标签以下列形式展现：
　　图片名　目标类别　起始点x坐标　ｙ坐标　结束点ｘ坐标　ｙ坐标

00001.jpg car 63 96 180 341
00002.jpg car 85 39 436 330
00003.jpg car 40 43 255 346
00004.jpg car 78 22 433 360
00005.jpg car 147 74 414 370

实现代码

# -*- coding: utf-8 -*-
import os
import cv2
import numpy as np# 当鼠标按下时变为 True
drawing = False
ix,iy = -1,-1
ox,oy = -1,-1
# 创建回调函数
def draw_circle(event,x,y,flags,param):global ix,iy,ox,oy,drawing# 当按下左键是返回起始位置坐标if event==cv2.EVENT_LBUTTONDOWN:drawing=Trueix,iy = x,y# 当鼠标左键按下并移动是绘制图形。 event 可以查看移动, flag 查看是否按下elif event==cv2.EVENT_MOUSEMOVE and flags==cv2.EVENT_FLAG_LBUTTON:if drawing==True:cv2.rectangle(image,(ix,iy),(x,y),(0,255,0),-1)ox,oy = x,yelif event==cv2.EVENT_LBUTTONUP:drawing==Falsenumber = 0
jpg = ".jpg"
Image_Path = "./images"
f_wrect = open('images.txt','a')
for file in os.listdir(Image_Path):number = number + 1#print(number)string_number = '%d'%number#print(string_number)i = len(string_number)#print(i)while (5 - i) >  0:string_number = '0' + string_numberi = i + 1newname = string_number + jpgold_NamePath = os.path.join(Image_Path,file)new_NamePath = os.path.join(Image_Path,newname)os.rename(old_NamePath,new_NamePath)image = cv2.imread(new_NamePath)cv2.namedWindow('image')cv2.setMouseCallback('image',draw_circle)while(1):cv2.imshow('image',image)#运行代码，会显示一张图片，当按下q键时，显示图片的窗口被关掉，结束程序。if (cv2.waitKey(1)&0xFF==ord('q')):print('ok')image_rect = newname + ' ceramic '+ '%d'% ix +' '+ '%d'% iy+ ' ' + '%d'% ox + ' ' + '%d'% oy + '\n'f_wrect.write(image_rect)breakcv2.destroyWindow('image')

参考博客：http://www.cnblogs.com/YangQiaoblog/p/6782183.html

未完待续。。。。。。。。。。

一些不懂的细碎的知识点，可以参考下列博客：

LRN层作用：http://blog.csdn.net/u014114990/article/details/47662189
POI Pooling层：http://blog.csdn.net/lanran2/article/details/60143861
SmoothL1Loss层：http://blog.csdn.net/xyy19920105/article/details/50421225
numpy.where()：http://blog.csdn.net/lanchunhui/article/details/49489205

    np.where()[0] 表示行的索引，np.where()[1] 则表示列的索引

numpy.hstack()函数：http://blog.csdn.net/garfielder007/article/details/51378296
Stack arrays in sequence horizontally (column wise).以列为主，水平方向上合并数组。
程序实例：

    >>> a = np.array((1,2,3))  >>> b = np.array((2,3,4))  >>> np.hstack((a,b))  array([1, 2, 3, 2, 3, 4])  >>> a = np.array([[1],[2],[3]])  >>> b = np.array([[2],[3],[4]])  >>> np.hstack((a,b))  array([[1, 2],  [2, 3],  [3, 4]])

numpy.random.permulation(arrays):返回矩阵洗牌后的副本，意味着原矩阵不变
numpy.random.shuffle(arrays)：对原数据进行洗牌，却不返回任何值。

import numpy as np
arrays=np.array([1,2,3,4])
print np.random.permulation(arrays)
print arrays
print np.random.shuffle(arrays)
print arrays结果：
[4 2 3 1]
[1 2 3 4]#始终不变
None
[1 4 2 3]

np.reshape(arrays,(-1,2))：将数组arrays重新排列成列数为2的。不管-1在第几个参数的位置，重新排列时均以行为主。

[python]代码示例：
arrays=np.array([1,2,3,4])
print np.reshape(arrays,(-1,2))
print np.reshape(arrays,(-1,4))
print np.reshape(arrays,(2,-1))
print np.reshape(arrays,(4,-1))结果：
[[1,2],[3,4]]
[[1,2,3,4]]
[[1,2],[3,4]]
[[1],[2],[3],[4]]