OpenVINO 2021r2 C++ 超分辨率重建FSRCNN

2023-10-29 13:38

本文主要是介绍OpenVINO 2021r2 C++ 超分辨率重建FSRCNN,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

最近把OpenVINO升级到了最新版本(超级不喜欢openvino这点,每次升级都要换几个接口,虽说API会向前兼容几个版本,不过跟起来真累啊,OpenCV, FFMPEG也是这样,是不是开源项目都是这么玩的啊... ) 顺便来试试看最新版本的OpenVINO对图像超分的模型支持的怎么样。

 

先从FSRCNN 开始,毕竟这是图像超分的经典模型,运算量小推理速度快,超分效果又好。

 

从https://www.github.com/Saafke/FSRCNN_Tensorflow上看具体的实现,FSRCNN模型是针对图像的Y通道做处理,先除以255.0转到[0,1]的浮点,然后做2倍的超分,推理输出乘以255.0,并且clip(0,255)作为输出Y通道,对于Cb,Cr通道直接做bicubic 2X放大,最后组合成BGR图像输出

    def upscale(self, path):"""Upscales an image via model."""img = cv2.imread(path, 3)
#BGR转YCbCrimg_ycc = cv2.cvtColor(img, cv2.COLOR_BGR2YCrCb)img_y = img_ycc[:,:,0]
#Y通道转为[0,1]之间的浮点floatimg = img_y.astype(np.float32) / 255.0LR_input_ = floatimg.reshape(1, floatimg.shape[0], floatimg.shape[1], 1)with tf.Session(config=self.config) as sess:print("\nUpscale image by a factor of {}:\n".format(self.scale))# load and runckpt_name = self.ckpt_path + "fsrcnn_ckpt" + ".meta"saver = tf.train.import_meta_graph(ckpt_name)saver.restore(sess, tf.train.latest_checkpoint(self.ckpt_path))graph_def = sess.graphLR_tensor = graph_def.get_tensor_by_name("IteratorGetNext:0")HR_tensor = graph_def.get_tensor_by_name("NHWC_output:0")
#推理output = sess.run(HR_tensor, feed_dict={LR_tensor: LR_input_})# post-processY = output[0]
#输出数据Y通道乘255.0, clip到[0,255]之间Y = (Y * 255.0).clip(min=0, max=255)Y = (Y).astype(np.uint8)
#Cb,Cr做Bicubic插值放大# Merge with Chrominance channels Cr/CbCr = np.expand_dims(cv2.resize(img_ycc[:,:,1], None, fx=self.scale, fy=self.scale, interpolation=cv2.INTER_CUBIC), axis=2)Cb = np.expand_dims(cv2.resize(img_ycc[:,:,2], None, fx=self.scale, fy=self.scale, interpolation=cv2.INTER_CUBIC), axis=2)
#YCbCr转BGRHR_image = (cv2.cvtColor(np.concatenate((Y, Cr, Cb), axis=2), cv2.COLOR_YCrCb2BGR))bicubic_image = cv2.resize(img, None, fx=self.scale, fy=self.scale, interpolation=cv2.INTER_CUBIC)cv2.imshow('Original image', img)cv2.imshow('HR image', HR_image)cv2.imshow('Bicubic HR image', bicubic_image)cv2.waitKey(0)sess.close()

对于openvino实现来说,所有的超分模型,只要Module Optimizer能正确的转换,那么推理部分基本都没什么问题,需要考虑的就是输入给模型的数据预处理部分,是丢进去[0,1]之间的浮点,还是[-1,1]的浮点,输入数据要不要叠加mean/shift的计算, 这部分预处理可以在MO转IR模型时候通过参数丢给IR模型,让IE去做;以及输出部分的浮点怎么转换到[0,255]之间的RGB/YUV像素,这部分需要自己实现代码手工处理。

 

开始MO转换, 我希望输入图像分辨率大一点,所以定义输入尺寸为640x480, 这样输出的图片尺寸在1280x960. 通过scale_value=[255.0]告诉IE在计算时每个输入数据要除以255.0

C:\temp_20151027\FSRCNN_Tensorflow-master\models>python "c:\Program Files (x86)\IntelSWTools\openvino_2021\deployment_tools\model_optimizer\mo_tf.py" --scale_values=[255.0] --input_shape=[1,480,640,1] --input_model=FSRCNN_x2.pb --data_type FP16 --output=NHWC_output

 

接下来是C++代码的实现,借用了前一篇文章 OpenVINO 2020r3 体验GPU Remote Blob API 里推理的代码,只是在最后处理输出outputblob的地方换成转换像素的代码

 

/*
loadjpg将彩色图像变成灰度图像
static void loadjpg(const char * jpgname, int width, int height)
{//loadimage(&jpg, jpgname);//cv::Mat jpg_2x;jpg = cv::imread(jpgname);cout << "load image: " << jpgname << " resize: w=" << width << " h=" << height << endl;//resize to width*heightstd::cout << "convert img to Gray" << std::endl;cv::cvtColor(jpg, jpg, cv::COLOR_BGR2GRAY);  //COLOR_BGR2YCrCb or COLOR_BGR2YUVcv::resize(jpg, jpg, cv::Size(width, height), 0, 0, cv::INTER_CUBIC);cv::resize(jpg, jpg_2x, cv::Size(width * 2, height * 2), 0, 0, cv::INTER_CUBIC);cv::imshow("bic_2x", jpg_2x);cv::imwrite("palace_gray_bic_2x.png", jpg_2x);
}
*/string FLAGS_d = "GPU"; //"CPU"; 选择用CPU还是GPU推理string FLAGS_m = "C:\\work\\opencl_2020\\cmake_fsrcnn_ov2021\\src\\FSRCNN_x2_FP16.xml";string FLAGS_i = "C:\\work\\opencl_2020\\cmake_fsrcnn_ov2021\\src\\palace.jpg";int FLAGS_nt = 10;cout << "starting" << endl;const Version *IEversion;IEversion = GetInferenceEngineVersion();cout << "InferenceEngine: API version " << IEversion->apiVersion.major << "." << IEversion->apiVersion.minor << endl;cout << "InferenceEngine: Build : " << IEversion->buildNumber << endl << endl;// --------------------------- 1. Load inference engine -------------------------------------cout << "Creating Inference Engine" << endl;Core ie;// -----------------------------------------------------------------------------------------------------// --------------------------- 2. Read IR Generated by ModelOptimizer (.xml and .bin files) ------------cout << "Loading network files" << endl;/** Read network model **/CNNNetwork network = ie.ReadNetwork(FLAGS_m);cout << "network layer count: " << network.layerCount() << endl;// -----------------------------------------------------------------------------------------------------// --------------------------- 3. Configure input & output ---------------------------------------------// --------------------------- Prepare input blobs -----------------------------------------------------cout << "Preparing input blobs" << endl;/** Taking information about all topology inputs **/InputsDataMap inputInfo(network.getInputsInfo());if (inputInfo.size() != 1) throw std::logic_error("Sample supports topologies with 1 input only");auto inputInfoItem = *inputInfo.begin();/** Specifying the precision and layout of input data provided by the user.* This should be called before load of the network to the device **/inputInfoItem.second->setPrecision(Precision::U8);inputInfoItem.second->setLayout(Layout::NCHW);//cout << FLAGS_i << endl;
//loadjpg将RGB图像转换成灰度图像,这样比较简单loadjpg(FLAGS_i.c_str(), inputInfoItem.second->getTensorDesc().getDims()[3],inputInfoItem.second->getTensorDesc().getDims()[2]);if (jpg.data == NULL){cout << "Valid input images were not found!" << endl;}/** Setting batch size to 1 **/network.setBatchSize(1);size_t batchSize = network.getBatchSize();cout << "Batch size is " << std::to_string(batchSize) << endl;// --------------------------- 4. Loading model to the device ------------------------------------------cout << "Loading model to the device: " << FLAGS_d << endl;ExecutableNetwork executable_network = ie.LoadNetwork(network, FLAGS_d);// -----------------------------------------------------------------------------------------------------// --------------------------- 5. Create infer request -------------------------------------------------cout << "Create infer request" << endl;InferRequest inferRequest_regular = executable_network.CreateInferRequest();// -----------------------------------------------------------------------------------------------------// --------------------------- 6. Prepare input --------------------------------------------------------for (auto & item : inputInfo) {Blob::Ptr inputBlob = inferRequest_regular.GetBlob(item.first);SizeVector dims = inputBlob->getTensorDesc().getDims();/** Fill input tensor with images. First b channel, then g and r channels **/size_t num_channels = dims[1];std::cout << "num_channles = " << num_channels << std::endl;size_t image_size = dims[3] * dims[2];MemoryBlob::Ptr minput = as<MemoryBlob>(inputBlob);if (!minput) {cout << "We expect MemoryBlob from inferRequest_regular, but by fact we were not able to cast inputBlob to MemoryBlob" << endl;return 1;}// locked memory holder should be alive all time while access to its buffer happensauto minputHolder = minput->wmap();auto data = minputHolder.as<PrecisionTrait<Precision::U8>::value_type *>();unsigned char* pixels = (unsigned char*)(jpg.data);cout << "image_size = " << image_size << endl;/** Iterate over all pixel in image (b,g,r) **/
//将Mat数据转换给inputBlobfor (size_t pid = 0; pid < image_size; pid++) {/** Iterate over all channels **/for (size_t ch = 0; ch < num_channels; ++ch) {/**          [images stride + channels stride + pixel id ] all in bytes            **/data[ch * image_size + pid] = pixels[pid*num_channels + ch];}}}milliseconds start_ms = duration_cast<milliseconds>(system_clock::now().time_since_epoch());// --------------------------- 7. Do inference ---------------------------------------------------------
#if 0//for async inferencesize_t numIterations = 10;size_t curIteration = 0;std::condition_variable condVar;inferRequest_regular.SetCompletionCallback([&] {curIteration++;cout << "Completed " << curIteration << " async request execution" << endl;if (curIteration < numIterations) {/* here a user can read output containing inference results and put new inputto repeat async request again */inferRequest_regular.StartAsync();}else {/* continue sample execution after last Asynchronous inference request execution */condVar.notify_one();}});/* Start async request for the first time */cout << "Start inference (" << numIterations << " asynchronous executions)" << endl;inferRequest_regular.StartAsync();/* Wait all repetitions of the async request */std::mutex mutex;std::unique_lock<std::mutex> lock(mutex);condVar.wait(lock, [&] { return curIteration == numIterations; });
#else/* Start sync request */cout << "Start inference " << endl;inferRequest_regular.Infer();
#endifmilliseconds end_ms = duration_cast<milliseconds>(system_clock::now().time_since_epoch());std::cout << "total cost time: " << (end_ms - start_ms).count() << " ms" << std::endl;float total_time = (end_ms - start_ms).count() / 1000.0;std::cout << "FPS: " << (float)1.0 / total_time << std::endl;// -----------------------------------------------------------------------------------------------------// --------------------------- 8. Process output -------------------------------------------------------cout << "Processing output blobs" << endl;OutputsDataMap outputInfo(network.getOutputsInfo());cout << "output blob name: " << outputInfo.begin()->first << endl;if (outputInfo.size() != 1) throw std::logic_error("Sample supports topologies with 1 output only");MemoryBlob::CPtr moutput = as<MemoryBlob> (inferRequest_regular.GetBlob(outputInfo.begin()->first));/** Validating -nt value **/const size_t resultsCnt = moutput->size() / batchSize;if (FLAGS_nt > resultsCnt || FLAGS_nt < 1) {cout << "-nt " << FLAGS_nt << " is not available for this network (-nt should be less than " \<< resultsCnt + 1 << " and more than 0)\n            will be used maximal value : " << resultsCnt << endl;FLAGS_nt = resultsCnt;}if (!moutput) {throw std::logic_error("We expect output to be inherited from MemoryBlob, ""but by fact we were not able to cast it to MemoryBlob");}// locked memory holder should be alive all time while access to its buffer happensauto lmoHolder = moutput->rmap();const auto output_data = lmoHolder.as<const PrecisionTrait<Precision::FP32>::value_type *>();size_t num_images = moutput->getTensorDesc().getDims()[0];size_t num_channels = moutput->getTensorDesc().getDims()[1];size_t H = moutput->getTensorDesc().getDims()[2];size_t W = moutput->getTensorDesc().getDims()[3];size_t nPixels = W * H;//处理outputBlob, 将输出浮点数转换成像素std::cout << "Output size [N,C,H,W]: " << num_images << ", " << num_channels << ", " << H << ", " << W << std::endl;{std::vector<float> data_img(nPixels * num_channels);if (num_channels == 1){cv::Mat Img(H, W, CV_8U);unsigned char *image_ptr = Img.data;for (size_t n = 0; n < num_images; n++) {for (size_t i = 0; i < nPixels; i++) {data_img[i ] = static_cast<float>(output_data[i + n * nPixels ])*255.0;//std::cout << "i:" << i << "  data:" << data_img[i] << std::endl;if (data_img[i  ] < 0) data_img[i  ] = 0;if (data_img[i  ] > 255) data_img[i  ] = 255;image_ptr[i] = data_img[i];}}imshow("FSRCNN_2x", Img);cv::imwrite("palace_FSRCNN_gray_2x.png", Img);std::cout << "Output Image created" << std::endl;}

最终得到输出结果

原始图片(测试图片来自网络)

Bicubic的2x放大效果

FSRCNN 2X效果

 

最终调用inferRequest_regular.Infer()推理的时间, 在我的8665U 4核8线程的CPU和 Gen9 24EU的核显上

  • CPU: 68ms (14.71FPS)
  • GPU: 48ms (20.83FPS)

基本上在8代CPU的核显上能到20fps, 如果换到现在主流平台的11代Tigerlake的Gen12 96EU上, 预计性能翻个3倍应该没问题,到时候应该能用FSRCNN来做个老电影AI修复的实时播放器

 

最后源码奉上,仅供参考

https://gitee.com/tisandman/fsrcnn_ov2021

 

这篇关于OpenVINO 2021r2 C++ 超分辨率重建FSRCNN的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/301001

相关文章

【C++ Primer Plus习题】13.4

大家好,这里是国中之林! ❥前些天发现了一个巨牛的人工智能学习网站,通俗易懂,风趣幽默,忍不住分享一下给大家。点击跳转到网站。有兴趣的可以点点进去看看← 问题: 解答: main.cpp #include <iostream>#include "port.h"int main() {Port p1;Port p2("Abc", "Bcc", 30);std::cout <<

C++包装器

包装器 在 C++ 中,“包装器”通常指的是一种设计模式或编程技巧,用于封装其他代码或对象,使其更易于使用、管理或扩展。包装器的概念在编程中非常普遍,可以用于函数、类、库等多个方面。下面是几个常见的 “包装器” 类型: 1. 函数包装器 函数包装器用于封装一个或多个函数,使其接口更统一或更便于调用。例如,std::function 是一个通用的函数包装器,它可以存储任意可调用对象(函数、函数

C++11第三弹:lambda表达式 | 新的类功能 | 模板的可变参数

🌈个人主页: 南桥几晴秋 🌈C++专栏: 南桥谈C++ 🌈C语言专栏: C语言学习系列 🌈Linux学习专栏: 南桥谈Linux 🌈数据结构学习专栏: 数据结构杂谈 🌈数据库学习专栏: 南桥谈MySQL 🌈Qt学习专栏: 南桥谈Qt 🌈菜鸡代码练习: 练习随想记录 🌈git学习: 南桥谈Git 🌈🌈🌈🌈🌈🌈🌈🌈🌈🌈🌈🌈🌈�

【C++】_list常用方法解析及模拟实现

相信自己的力量,只要对自己始终保持信心,尽自己最大努力去完成任何事,就算事情最终结果是失败了,努力了也不留遗憾。💓💓💓 目录   ✨说在前面 🍋知识点一:什么是list? •🌰1.list的定义 •🌰2.list的基本特性 •🌰3.常用接口介绍 🍋知识点二:list常用接口 •🌰1.默认成员函数 🔥构造函数(⭐) 🔥析构函数 •🌰2.list对象

06 C++Lambda表达式

lambda表达式的定义 没有显式模版形参的lambda表达式 [捕获] 前属性 (形参列表) 说明符 异常 后属性 尾随类型 约束 {函数体} 有显式模版形参的lambda表达式 [捕获] <模版形参> 模版约束 前属性 (形参列表) 说明符 异常 后属性 尾随类型 约束 {函数体} 含义 捕获:包含零个或者多个捕获符的逗号分隔列表 模板形参:用于泛型lambda提供个模板形参的名

6.1.数据结构-c/c++堆详解下篇(堆排序,TopK问题)

上篇:6.1.数据结构-c/c++模拟实现堆上篇(向下,上调整算法,建堆,增删数据)-CSDN博客 本章重点 1.使用堆来完成堆排序 2.使用堆解决TopK问题 目录 一.堆排序 1.1 思路 1.2 代码 1.3 简单测试 二.TopK问题 2.1 思路(求最小): 2.2 C语言代码(手写堆) 2.3 C++代码(使用优先级队列 priority_queue)

【C++高阶】C++类型转换全攻略:深入理解并高效应用

📝个人主页🌹:Eternity._ ⏩收录专栏⏪:C++ “ 登神长阶 ” 🤡往期回顾🤡:C++ 智能指针 🌹🌹期待您的关注 🌹🌹 ❀C++的类型转换 📒1. C语言中的类型转换📚2. C++强制类型转换⛰️static_cast🌞reinterpret_cast⭐const_cast🍁dynamic_cast 📜3. C++强制类型转换的原因📝

C++——stack、queue的实现及deque的介绍

目录 1.stack与queue的实现 1.1stack的实现  1.2 queue的实现 2.重温vector、list、stack、queue的介绍 2.1 STL标准库中stack和queue的底层结构  3.deque的简单介绍 3.1为什么选择deque作为stack和queue的底层默认容器  3.2 STL中对stack与queue的模拟实现 ①stack模拟实现

c++的初始化列表与const成员

初始化列表与const成员 const成员 使用const修饰的类、结构、联合的成员变量,在类对象创建完成前一定要初始化。 不能在构造函数中初始化const成员,因为执行构造函数时,类对象已经创建完成,只有类对象创建完成才能调用成员函数,构造函数虽然特殊但也是成员函数。 在定义const成员时进行初始化,该语法只有在C11语法标准下才支持。 初始化列表 在构造函数小括号后面,主要用于给

2024/9/8 c++ smart

1.通过自己编写的class来实现unique_ptr指针的功能 #include <iostream> using namespace std; template<class T> class unique_ptr { public:         //无参构造函数         unique_ptr();         //有参构造函数         unique_ptr(