深度学习的数据格式转换（mobilenet+ssd,centernet）

本文主要是介绍深度学习的数据格式转换（mobilenet+ssd,centernet），希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

深度学习的数据格式转换（mobilenet+ssd,centernet）

初步生成VOC2012数据集

1、数据标注

1.1有目标的图片标注

应用labelImg对图片进行标注，下载链接：https://github.com/tzutalin/labelImg
标注时，需要注意的点，（参考自https://blog.csdn.net/chenmaolin88/article/details/79357502）


标注什么	预定义的所有类别的所有对象实例（就是说，如果图片里面有３只浣熊，就要分别标注３只浣熊）, 除非:你拿不准那玩意儿是不是。对象非常非常的小（尺度自己拿捏），只能看见对象的不到 10-20%的部分 , 因此你拿不准那个到底是哪一类的，比如你只能看见一个轮胎，你不确定是卡车还是小轿车，这种就可以不用标注.如果图片中的对象肉眼都难以识别，就丢掉这张图片
难以识别（difficult）	若肉眼虽然可以大致识别，但确信度不是很高，则勾选difficult复选框，表示这个对象不是很好识别。
矩形框	用矩形框标注对象的可见区域，不可见的区域不要标注.　非对象的区域不要标注，矩形框应该要且仅包括对象的所有可见的像素点, 除非为了包括很小一部分的对象部件，需要扩大很大一个矩形框面积，比如，小轿车的天线可以不用框进来，因为他太小了，且天线对于汽车来说无关紧要，并非主要特征。
截断Truncated	如果对象超过 15-20% 的部分不在矩形框内，则将对象标记为Truncated. 这个标记意味着矩形框内没有包含完成的对象实例。这个属性在LabelImg中无法直接勾选，需要手工编辑ＸＭＬ文件里的对应标签。
遮挡Occlusion	如果矩形框内，对象有超过 5% 的部分被遮挡, 标记为 Occluded. 这个标记指示矩形框内的图像存在被遮挡的情况。这个属性在LabelImg中无法直接勾选，需要手工编辑ＸＭＬ文件里的对应标签。
衣服、雪、泥etc	如果遮挡物是跟对象强相关的，则不用标记为遮挡，比如　人身上的衣服，应视为人的一部分。
透明	透过玻璃看到的对象也应该被标记, 但是若玻璃是有点反光的，则玻璃上的映像，应被标记为遮挡 occlusion
镜子	镜子里的对象也应该被标记。
透明	透过玻璃看到的对象也应该被标记, 但是若玻璃是有点反光的，则玻璃上的映像，应被标记为遮挡 occlusion
海报	图片里面的海报、杂志等上面的对象也应该被标记，除非是一些很浮夸的卡通画

1.2无目标图片的标注

应用以下代码，自动生成无标注目标的xml文件
参考链接：https://www.jianshu.com/p/5b2254fdf8f8

#! /usr/bin/python
# -*- coding:UTF-8 -*-
import os, sys
import glob
from PIL import Image# VEDAI 图像存储位置
src_img_dir = "/home/lehui/Desktop/负样本700"
# VEDAI 图像生成的xml文件存放位置
src_xml_dir = "/home/lehui/Desktop/xml"img_Lists = glob.glob(src_img_dir + '/*.jpg')img_basenames = [] # e.g. 100.jpg
for item in img_Lists:img_basenames.append(os.path.basename(item))img_names = [] # e.g. 100
for item in img_basenames:temp1, temp2 = os.path.splitext(item)img_names.append(temp1)for img in img_names:im = Image.open((src_img_dir + '/' + img + '.jpg'))width, height = im.size# write in xml file#os.mknod(src_xml_dir + '/' + img + '.xml')xml_file = open((src_xml_dir + '/' + img + '.xml'), 'w')xml_file.write('<annotation>\n')xml_file.write('    <folder>VOC2007</folder>\n')xml_file.write('    <filename>' + str(img) + '.jpg' + '</filename>\n')xml_file.write('    <path>'+ src_xml_dir + '/' + str(img) + '.jpg' + '</path>\n')xml_file.write('    <source>\n')xml_file.write('        <database>' + "Unknow" + '</database>\n')xml_file.write('    </source>\n')xml_file.write('    <size>\n')xml_file.write('        <width>' + str(width) + '</width>\n')xml_file.write('        <height>' + str(height) + '</height>\n')xml_file.write('        <depth>3</depth>\n')xml_file.write('    </size>\n')xml_file.write('    <segmented>0</segmented>\n')xml_file.write('</annotation>')

2、分配训练集、验证集、测试集

应用以下代码，分别得到train、val、test对应的xml集合。
参考链接：https://www.cnblogs.com/gezhuangzhuang/p/10613468.html

import os  
import random  
import time  
import shutil
#xmlfilepath——所有xml的路径，saveBasePath——保存结果的路径，下面建立三个文件夹：train、val、test
xmlfilepath=r'./Annotations'  
saveBasePath=r"./Annotations"trainval_percent=0.8  
train_percent=0.8  
total_xml = os.listdir(xmlfilepath)  
num=len(total_xml)  
list=range(num)  
tv=int(num*trainval_percent)  
tr=int(tv*train_percent)  
trainval= random.sample(list,tv)  
train=random.sample(trainval,tr)  
print("train and val size",tv)  
print("train size",tr) start = time.time()test_num=0  
val_num=0  
train_num=0  for i in list:  name=total_xml[i]if i in trainval:  #train and val set if i in train: directory="train"  train_num += 1  xml_path = os.path.join(os.getcwd(), 'Annotations/{}'.format(directory))  if(not os.path.exists(xml_path)):  os.mkdir(xml_path)  filePath=os.path.join(xmlfilepath,name)  newfile=os.path.join(saveBasePath,os.path.join(directory,name))  shutil.copyfile(filePath, newfile)else:directory="validation"  xml_path = os.path.join(os.getcwd(), 'Annotations/{}'.format(directory))  if(not os.path.exists(xml_path)):  os.mkdir(xml_path)  val_num += 1  filePath=os.path.join(xmlfilepath,name)   newfile=os.path.join(saveBasePath,os.path.join(directory,name))  shutil.copyfile(filePath, newfile)else:directory="test"  xml_path = os.path.join(os.getcwd(), 'Annotations/{}'.format(directory))  if(not os.path.exists(xml_path)):  os.mkdir(xml_path)  test_num += 1  filePath=os.path.join(xmlfilepath,name)  newfile=os.path.join(saveBasePath,os.path.join(directory,name))  shutil.copyfile(filePath, newfile)end = time.time()  
seconds=end-start  
print("train total : "+str(train_num))  
print("validation total : "+str(val_num))  
print("test total : "+str(test_num))  
total_num=train_num+val_num+test_num  
print("total number : "+str(total_num))  
print( "Time taken : {0} seconds".format(seconds))

3、文件格式组成

：所有的xml文件
：所有的图片
：2中生成的训练集、测试集、验证集对应的txt
标签分类的配置文件（label_map.txt）

item {id: 1    # id 从1开始编号name: 'red pedestrian'
}item {id: 2name: 'green pedestrian'
}

tfrecored数据的生成

依赖\models\research\object_detection\dataset_tools\create_pascal_tf_record.py代码，
修改对应参数，生成tfrecored格式数据。

json数据的生成

根据xml文件生成对应的json
参考链接：https://blog.csdn.net/weixin_41765699/article/details/100124689

import xml.etree.ElementTree as ET
import os
import jsoncoco = dict()
coco['images'] = []
coco['type'] = 'instances'
coco['annotations'] = []
coco['categories'] = []category_set = dict()
image_set = set()category_item_id = -1
image_id = 20180000000
annotation_id = 0def addCatItem(name):global category_item_idcategory_item = dict()category_item['supercategory'] = 'none'category_item_id += 1category_item['id'] = category_item_idcategory_item['name'] = namecoco['categories'].append(category_item)category_set[name] = category_item_idreturn category_item_iddef addImgItem(file_name, size):global image_idif file_name is None:raise Exception('Could not find filename tag in xml file.')if size['width'] is None:raise Exception('Could not find width tag in xml file.')if size['height'] is None:raise Exception('Could not find height tag in xml file.')image_id += 1image_item = dict()image_item['id'] = image_idimage_item['file_name'] = file_nameimage_item['width'] = size['width']image_item['height'] = size['height']coco['images'].append(image_item)image_set.add(file_name)return image_iddef addAnnoItem(object_name, image_id, category_id, bbox):global annotation_idannotation_item = dict()annotation_item['segmentation'] = []seg = []# bbox[] is x,y,w,h# left_topseg.append(bbox[0])seg.append(bbox[1])# left_bottomseg.append(bbox[0])seg.append(bbox[1] + bbox[3])# right_bottomseg.append(bbox[0] + bbox[2])seg.append(bbox[1] + bbox[3])# right_topseg.append(bbox[0] + bbox[2])seg.append(bbox[1])annotation_item['segmentation'].append(seg)annotation_item['area'] = bbox[2] * bbox[3]annotation_item['iscrowd'] = 0annotation_item['ignore'] = 0annotation_item['image_id'] = image_idannotation_item['bbox'] = bboxannotation_item['category_id'] = category_idannotation_id += 1annotation_item['id'] = annotation_idcoco['annotations'].append(annotation_item)def parseXmlFiles(xml_path):for f in os.listdir(xml_path):if not f.endswith('.xml'):continuebndbox = dict()size = dict()current_image_id = Nonecurrent_category_id = Nonefile_name = Nonesize['width'] = Nonesize['height'] = Nonesize['depth'] = Nonexml_file = os.path.join(xml_path, f)print(xml_file)tree = ET.parse(xml_file)root = tree.getroot()if root.tag != 'annotation':raise Exception('pascal voc xml root element should be annotation, rather than {}'.format(root.tag))# elem is <folder>, <filename>, <size>, <object>for elem in root:current_parent = elem.tagcurrent_sub = Noneobject_name = Noneif elem.tag == 'folder':continueif elem.tag == 'filename':file_name = elem.textif file_name in category_set:raise Exception('file_name duplicated')# add img item only after parse <size> tagelif current_image_id is None and file_name is not None and size['width'] is not None:if file_name not in image_set:current_image_id = addImgItem(file_name, size)print('add image with {} and {}'.format(file_name, size))else:raise Exception('duplicated image: {}'.format(file_name))# subelem is <width>, <height>, <depth>, <name>, <bndbox>for subelem in elem:bndbox['xmin'] = Nonebndbox['xmax'] = Nonebndbox['ymin'] = Nonebndbox['ymax'] = Nonecurrent_sub = subelem.tagif current_parent == 'object' and subelem.tag == 'name':object_name = subelem.textif object_name not in category_set:current_category_id = addCatItem(object_name)else:current_category_id = category_set[object_name]elif current_parent == 'size':if size[subelem.tag] is not None:raise Exception('xml structure broken at size tag.')size[subelem.tag] = int(subelem.text)# option is <xmin>, <ymin>, <xmax>, <ymax>, when subelem is <bndbox>for option in subelem:if current_sub == 'bndbox':if bndbox[option.tag] is not None:raise Exception('xml structure corrupted at bndbox tag.')bndbox[option.tag] = int(option.text)# only after parse the <object> tagif bndbox['xmin'] is not None:if object_name is None:raise Exception('xml structure broken at bndbox tag')if current_image_id is None:raise Exception('xml structure broken at bndbox tag')if current_category_id is None:raise Exception('xml structure broken at bndbox tag')bbox = []# xbbox.append(bndbox['xmin'])# ybbox.append(bndbox['ymin'])# wbbox.append(bndbox['xmax'] - bndbox['xmin'])# hbbox.append(bndbox['ymax'] - bndbox['ymin'])print('add annotation with {},{},{},{}'.format(object_name, current_image_id, current_category_id,bbox))addAnnoItem(object_name, current_image_id, current_category_id, bbox)if __name__ == '__main__':xml_path = 'Z:\pycharm_projects\ssd\VOCtest60\Annotations'    # 这是xml文件所在的地址json_file = './test.json'                                     # 这是你要生成的json文件                        parseXmlFiles(xml_path)                                       # 只需要改动这两个参数就行了json.dump(coco, open(json_file, 'w'))

3、模型转换

3.1 pb to pbtxt

更改D:\Program Files\opencv\sources\samples\dnn\tf_text_graph_ssd.py,中对应的输入参数，转换得到.pbtxt文件。

3.2 pt to pth

3.2 pth to onnx

3.2 onnx to ncnn

4、利用opencvdnn库调用mobilenet——ssd模型

复制描述模型的四个文件，后缀分别为.data-00000-of-00001 .index .meta和checkpoint文件，应用以下命令

python object_detection/export_inference_graph.py --input_type=image_tensor--pipeline_config_path=\models\ssd_mobilenet_v2_coco.config --trained_checkpoint_prefix=\pedestrian_data\model\model.ckpt-1589 
--output_directory\pedestrian_data\test

得到对应的pb模型。
ssd_mobilenet_v2 训练得到的pb模型转换为pbtxt,更改D:\Program Files\opencv\sources\samples\dnn\tf_text_graph_ssd.py,中对应的输入参数，转换得到.pbtxt文件。运用opencv中的dnn库读取网络，进行c++端的检测，代码如下：
参考链接：https://blog.csdn.net/atpalain_csdn/article/details/100098720

#include <opencv2/core.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/dnn.hpp>#include <string>
#include <iostream>
#include <time.h>using namespace std;
using namespace cv;
using namespace dnn;float confThreshold, nmsThreshold;
std::vector<std::string> classes;void postprocess(Mat& frame, const std::vector<Mat>& out, Net& net);
void drawPred(int classId, float conf, int left, int top, int right, int bottom, Mat& frame);int main(int argc, char** argv)
{// 根据选择的检测模型文件进行配置confThreshold = 0.5;nmsThreshold = 0.4;float scale = 1.0;Scalar mean = { 0, 0, 0 };bool swapRB = true;int inpWidth = 300;int inpHeight = 300;String modelPath = "frozen_inference_graph.pb";String configPath = "frozen_inference_graph.pbtxt";string image_file = "E:\\project\\data\\hands_data\\img1125\\";String framework = "";int backendId = cv::dnn::DNN_BACKEND_OPENCV;int targetId = cv::dnn::DNN_TARGET_CPU;//String classesFile = R"(object_detection_classes_coco.txt)";// Open file with classes names.//if (!classesFile.empty()) {//	const std::string& file = classesFile;//	std::ifstream ifs(file.c_str());//	if (!ifs.is_open())//		CV_Error(Error::StsError, "File " + file + " not found");//	std::string line;//	while (std::getline(ifs, line)) {//		classes.push_back(line);//	}//}classes.push_back("raiseHand");// Load a model.Net net = readNet(modelPath, configPath, framework);net.setPreferableBackend(backendId);net.setPreferableTarget(targetId);std::vector<String> outNames = net.getUnconnectedOutLayersNames();// Create a windowstatic const std::string kWinName = "Deep learning object detection in OpenCV";// Process frames.Mat frame, blob;namedWindow(kWinName, 0);vector< cv::String > files;cv::glob(image_file, files);for (int i = 0; i < files.size(); i++){frame = cv::imread(files[i]);//cv::Mat image(90, 120, CV_8UC3, cv::Scalar::all(0));if (frame.empty()){return 0;}//frame = imread("E:\\project\\data\\hands_data\\train_ssd\\hands_data_pos+neg\\raisehandData\\test\\0010.jpg");// Create a 4D blob from a frame.Size inpSize(inpWidth > 0 ? inpWidth : frame.cols,inpHeight > 0 ? inpHeight : frame.rows);blobFromImage(frame, blob, scale, inpSize, mean, swapRB, false);// Run a model.net.setInput(blob);if (net.getLayer(0)->outputNameToIndex("im_info") != -1)  // Faster-RCNN or R-FCN{resize(frame, frame, inpSize);Mat imInfo = (Mat_<float>(1, 3) << inpSize.height, inpSize.width, 1.6f);net.setInput(imInfo, "im_info");}std::vector<Mat> outs;net.forward(outs, outNames);postprocess(frame, outs, net);// Put efficiency information.std::vector<double> layersTimes;double freq = getTickFrequency() / 1000;double t = net.getPerfProfile(layersTimes) / freq;std::string label = format("Inference time: %.2f ms", t);cout << label << endl;putText(frame, label, Point(0, 15), FONT_HERSHEY_PLAIN, 0.5, Scalar(0, 255, 0));imshow(kWinName, frame);waitKey(1);}return 0;
}void postprocess(Mat& frame, const std::vector<Mat>& outs, Net& net)
{static std::vector<int> outLayers = net.getUnconnectedOutLayers();static std::string outLayerType = net.getLayer(outLayers[0])->type;std::vector<int> classIds;std::vector<float> confidences;std::vector<Rect> boxes;if (net.getLayer(0)->outputNameToIndex("im_info") != -1)  // Faster-RCNN or R-FCN{// Network produces output blob with a shape 1x1xNx7 where N is a number of// detections and an every detection is a vector of values// [batchId, classId, confidence, left, top, right, bottom]CV_Assert(outs.size() == 1);float* data = (float*)outs[0].data;for (size_t i = 0; i < outs[0].total(); i += 7) {float confidence = data[i + 2];if (confidence > confThreshold) {int left = (int)data[i + 3];int top = (int)data[i + 4];int right = (int)data[i + 5];int bottom = (int)data[i + 6];int width = right - left + 1;int height = bottom - top + 1;classIds.push_back((int)(data[i + 1]) - 1);  // Skip 0th background class id.boxes.push_back(Rect(left, top, width, height));confidences.push_back(confidence);}}}else if (outLayerType == "DetectionOutput") {// Network produces output blob with a shape 1x1xNx7 where N is a number of// detections and an every detection is a vector of values// [batchId, classId, confidence, left, top, right, bottom]CV_Assert(outs.size() == 1);float* data = (float*)outs[0].data;for (size_t i = 0; i < outs[0].total(); i += 7) {float confidence = data[i + 2];if (confidence > confThreshold) {int left = (int)(data[i + 3] * frame.cols);int top = (int)(data[i + 4] * frame.rows);int right = (int)(data[i + 5] * frame.cols);int bottom = (int)(data[i + 6] * frame.rows);int width = right - left + 1;int height = bottom - top + 1;classIds.push_back((int)(data[i + 1]) - 1);  // Skip 0th background class id.boxes.push_back(Rect(left, top, width, height));confidences.push_back(confidence);}}}else if (outLayerType == "Region") {for (size_t i = 0; i < outs.size(); ++i) {// Network produces output blob with a shape NxC where N is a number of// detected objects and C is a number of classes + 4 where the first 4// numbers are [center_x, center_y, width, height]float* data = (float*)outs[i].data;for (int j = 0; j < outs[i].rows; ++j, data += outs[i].cols) {Mat scores = outs[i].row(j).colRange(5, outs[i].cols);Point classIdPoint;double confidence;minMaxLoc(scores, 0, &confidence, 0, &classIdPoint);if (confidence > confThreshold) {int centerX = (int)(data[0] * frame.cols);int centerY = (int)(data[1] * frame.rows);int width = (int)(data[2] * frame.cols);int height = (int)(data[3] * frame.rows);int left = centerX - width / 2;int top = centerY - height / 2;classIds.push_back(classIdPoint.x);confidences.push_back((float)confidence);boxes.push_back(Rect(left, top, width, height));}}}}elseCV_Error(Error::StsNotImplemented, "Unknown output layer type: " + outLayerType);std::vector<int> indices;NMSBoxes(boxes, confidences, confThreshold, nmsThreshold, indices);for (size_t i = 0; i < indices.size(); ++i) {int idx = indices[i];Rect box = boxes[idx];drawPred(classIds[idx], confidences[idx], box.x, box.y,box.x + box.width, box.y + box.height, frame);}
}void drawPred(int classId, float conf, int left, int top, int right, int bottom, Mat& frame)
{rectangle(frame, Point(left, top), Point(right, bottom), Scalar(0, 255, 0));std::string label = format("%.2f", conf);if (!classes.empty()) {CV_Assert(classId < (int)classes.size());label = classes[classId] + ": " + label;}int baseLine;Size labelSize = getTextSize(label, FONT_HERSHEY_SIMPLEX, 0.2, 1, &baseLine);top = max(top, labelSize.height);rectangle(frame, Point(left, top - labelSize.height),Point(left + labelSize.width, top + baseLine), Scalar::all(255), FILLED);putText(frame, label, Point(left, top), FONT_HERSHEY_SIMPLEX, 0.2, Scalar());
}

问题集锦

pb转pbtxt时报错：

_graph_ssd.py --input frozen_inference_graph.pb --config ssd_mobilenet_v2_coco.config --output graph.pbtxt
Scale: [0.200000-0.950000]
Aspect ratios: [1.0, 2.0, 0.5, 3.0, 0.3333]
Reduce boxes in the lowest layer: True
Number of classes: 1
Number of layers: 6
box predictor: convolutional
Input image size: 300x300
Traceback (most recent call last):File "tf_text_graph_ssd.py", line 368, in <module>createSSDGraph(args.input, args.config, args.output)File "tf_text_graph_ssd.py", line 232, in createSSDGraphassert(graph_def.node[0].op == 'Placeholder')
AssertionError

在转换之前，先对pb文件进行如下操作：

import tensorflow as tf
from tensorflow.tools.graph_transforms import TransformGraphwith tf.gfile.FastGFile('ssdlite.pb', 'rb') as f:graph_def = tf.GraphDef()graph_def.ParseFromString(f.read())graph_def = TransformGraph(graph_def, ['image_tensor'], ['detection_boxes', 'detection_classes', 'detection_scores', 'num_detections'], ['sort_by_execution_order'])with tf.gfile.FastGFile('ssdlite_new.pb', 'wb') as f:f.write(graph_def.SerializeToString())#保存新的模型

再应用models里的代码进行转换，得到pbtxt文件。

这篇关于深度学习的数据格式转换（mobilenet+ssd,centernet）的文章就介绍到这儿，希望我们推荐的文章对编程师们有所帮助！