100行代码入门PaddlePaddle图像识别（无痛看代码）

本文主要是介绍100行代码入门PaddlePaddle图像识别（无痛看代码），希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

100行代码入门PaddlePaddle图像识别（无痛看代码）

导语：PaddlePaddle是由百度研发，国内首个开源的深度学习框架。你在学了N多机器学习课程后，发现要手写一个深度学习程序的时候仍会无从下手。本文目的是解决这种入门问题，适合有深度学习基础但不会写程序，或者会使用其他深度学习框架但想学习PaddlePaddle使用方式的人群。本文将带领大家将大脑中的想法及模型用PaddlePaddle框架实现出来。

设想一下，如果要实现一个图像分类的深度学习程序，有哪些必要的模块？首先想到的是他一定要有一个描述、定义网络结构的模块。在本文中我们就用VGG来描述网络结构，那第一个模块就是Vgg_bn_drop。有了这个网络模块我们可以推想出一定需要一个推理程序，这个程序会驱动网络模块产生一个输出，我们就叫这个输出为Predict。那第二个模块就是推理程序（Inference_Program）。我们有了Predict之后，在训练过程中自然需要将Predict与数据集中的Label进行比较，并通过损失函数来计算比较的差值。那第三个模块就是将Predict实例、Label定义、cost函数计算整合在一起的程序，它在PaddlePaddle里我们将它成为train_func，那在这里我们将第三个模块起名为train_program。在第三个模块里我们定义了cost，cost存在的意义是计算当前参数的推理与label的差值，从而调整网络中的参数，那我们就需要定义一个优化器来调整网络中的参数。所以第四个模块就是Optimizer。有了以上的四个模块，就将整个网络运转的流程（从推理到反向调整）都定义好了。
上述结构图

我们将框架定义好之后，需要一个程序将这个框架给驱动起来。这个驱动程序还有一个作用是数据灌入框架中，让数据再里面流动起来（这也是Fluid这个词的由来）。在PaddlePaddle中，可以使用Trainer这个方法来实现这个功能。之后我们只需要将数据准备好，做成reader的格式，就可以使用Trainer中的train函数来执行训练啦。
千里之行，至于足下。我们来看一下第一步代码该怎么写。
一、第一步除了导入各种库之外，

import paddle
import paddle.fluid as fluid
import numpy
import sys
from __future__ import print_function #用于提供Python3标准的print特性

自然是要将我们的第一个模块——网络结构定义给实现出来。所以我们定义一个vgg_bn_drop的函数：

def vgg_bn_drop(input):

我们观察一下VGG的网络结构
vgg16

可以发现VGG网络中有很多重复的部分，如果我们把这些重复的卷积操作化为一组，那么VGG中卷积的部分可以分为五组。在PaddlePaddle中对于这种连续的卷积操作可以用img_conv_group函数来实现。

····def conv_block(参数先空着):
········return fluid.nets.img_conv_group(）

img_conv_group是整合了卷积层、池化层、BatchNorm和DropOut的复合函数，并且可以很方便的支持连续卷及操作。我们想一下，对于每组连续卷积，我们需要定义哪些内容呢？首先它必须接受一个数据输入input。在卷积层方面，我们可以想到的是要定义卷积核大小、卷积核数量、卷积层激活函数；在池化层方面我们可以想到要定义池化区域的大小、池化窗口的步长以及池化的方法。那关于DropOut的功能我们需要提供一个DropOut的概率，在img_conv_group的参数中还有一个是否打开batchnorm的开关，需要指定一下。那么我们关img_conv_group的参数定义如下：

········return fluid.nets.img_conv_group(input=ipt,conv_filter_size=3,conv_num_filter=[num_filter] * groups,conv_act='relu',pool_size=2,pool_stride=2,pool_type='max'conv_with_batchnorm=True,conv_batchnorm_drop_rate=dropouts)

根据VGG的网络图我们发现所有卷积层的卷积核都为3*3，那么我们在参数中就直接指定一个参数3，如果在此处给定两个参数，PaddlePaddle会认为这是个WH格式的矩形卷积核。conv_num_filte参数需要给定这组连续卷积操作中所有的卷积核数量，以用来统一初始化，所以这里需要在num_filter后乘上group的数量。根据论文我们可以知道激活函数为’relu’。根据VGG的网络图我们看到是二分之一池化，所以pool_size和pool_stride都定义为2。之后使用最大池化方法、打开batchnorm选项、指定dropout的概率。需要注意的是这里给到的dropout需要以Python中List数据结构给出，这个list存放的是连续卷积中每一层卷积的dropout概率。到这里img_conv_group的定义就完成。根据这些参数，去除硬编码的参数我们发现input、num_filter、groups、dropouts需要在上层函数中传递。所以conv_block参数如下：

····def conv_block(ipt, num_filter, groups, dropouts):

所以在这一步我们的连续卷积定义就完成啦。但是我们的conv_block不能只有连续卷积的定义，还需要将他按照VGG模型的样子给组装起来。那么卷积层的组装代码为：

····conv1 = conv_block(input, 64, 2, [0.3, 0])conv2 = conv_block(conv1, 128, 2, [0.4, 0])conv3 = conv_block(conv2, 256, 3, [0.4, 0.4, 0])conv4 = conv_block(conv3, 512, 3, [0.4, 0.4, 0])conv5 = conv_block(conv4, 512, 3, [0.4, 0.4, 0])

从第二层开始，每一层接受上一层的输出，第二个参数根据VGG结构定义每一层输出的维度，第三个参数定义连续卷积的次数，第四个参数定义dropout的概率，最后一层不进行dropout操作。根据网络结构，后面需要做三层全连接操作，定义如下：

····drop = fluid.layers.dropout(x=conv5, dropout_prob=0.5)fc1 = fluid.layers.fc(input=drop, size=512, act=None)bn = fluid.layers.batch_norm(input=fc1, act='relu')drop2 = fluid.layers.dropout(x=bn, dropout_prob=0.5)fc2 = fluid.layers.fc(input=drop2, size=512, act=None)predict = fluid.layers.fc(input=fc2, size=10, act='softmax')

这里用到了PaddlePaddle内置的算子，有全连接layers.fc，batch_norm和dropout。所以整个vgg_bn_drop代码如下：

def vgg_bn_drop(input):def conv_block(ipt, num_filter, groups, dropouts):return fluid.nets.img_conv_group(input=ipt,pool_size=2,pool_stride=2,conv_num_filter=[num_filter] * groups,conv_filter_size=3,conv_act='relu',conv_with_batchnorm=True,conv_batchnorm_drop_rate=dropouts,pool_type='max')conv1 = conv_block(input, 64, 2, [0.3, 0])conv2 = conv_block(conv1, 128, 2, [0.4, 0])conv3 = conv_block(conv2, 256, 3, [0.4, 0.4, 0])conv4 = conv_block(conv3, 512, 3, [0.4, 0.4, 0])conv5 = conv_block(conv4, 512, 3, [0.4, 0.4, 0])drop = fluid.layers.dropout(x=conv5, dropout_prob=0.5)fc1 = fluid.layers.fc(input=drop, size=512, act=None)bn = fluid.layers.batch_norm(input=fc1, act='relu')drop2 = fluid.layers.dropout(x=bn, dropout_prob=0.5)fc2 = fluid.layers.fc(input=drop2, size=512, act=None)predict = fluid.layers.fc(input=fc2, size=10, act='softmax')return predict

二、定义好网络结构以后，我们需要将网络的输出Predict给接住，并且将它的奶嘴（输入格式）给备好，所以我们定义

def inference_program():

方法时要先接住输出：

    predict = vgg_bn_drop(images)

再喂上奶嘴：

    data_shape = [3, 32, 32]images = fluid.layers.data(name='pixel', shape=data_shape, dtype='float32')predict = vgg_bn_drop(images) # un-comment to use vgg net

在PaddlePaddle中，无论是图像数据，张量数据还是标签数据，都可以用layers.data容器来存放。在data函数中，'name’参数是可以自定义指定的。因为本实验是使用cifar10的数据，是3通道32x32的图片。所以inference_program的代码如下：

def inference_program():# The image is 32 * 32 with RGB representation.data_shape = [3, 32, 32]images = fluid.layers.data(name='pixel', shape=data_shape, dtype='float32')#predict = resnet_cifar10(images, 32)predict = vgg_bn_drop(images) # un-comment to use vgg netreturn predict

三、根据开头讲的思路，我们有了推理模块后，需要将Predict和label进行交叉对比计算，所以我们需要一个train_program。train_program起到的作用是定义label、计算损失函数、计算准确率。他需要将每一批的平均cost和准确率转给下一步的优化器。所以train_program的定义如下：

def train_program():predict = inference_program()label = fluid.layers.data(name='label', shape=[1], dtype='int64')cost = fluid.layers.cross_entropy(input=predict, label=label)avg_cost = fluid.layers.mean(cost)accuracy = fluid.layers.accuracy(input=predict, label=label)return [avg_cost, accuracy]

我们得到cost之后，在训练过程需要根据cost返回的数据来反向调整神经网络中的参数，那反向调整参数的模块就叫optimizer_program，我们对optimizer_program的定义只需要返回指定的optimizer即可（在这里指定学习率超参数）：

def optimizer_program():return fluid.optimizer.Adam(learning_rate=0.001)

四、有了以上三个模块，训练-推理-调整这一个有向循环图就构成了。现在我们就像已经将自来水管道修好，需要往里通水的状态。那我们整个循环系统的中控程序是什么呢？是fluid.Trainer。在PaddlePaddle中fluid.Trainer是一个较高层的API，使用时只需将fluid.Trainer这个类实例化即可，启动实例化对象中的.train()方法即可启动网络训练。那这就涉及到了两个步骤：实例化对象和启动训练。

五、在实例化对象时需要指定3个参数：train_func、optimizer和place。train_func就是我们刚才定义的train_program，它包含了网络正向推理及cost的所有信息，只需将train_program传递给train_func参数就好。optimizer如同一辙。place的含义是整个训练程序在哪个设备上运行，不用多说，计算机中进行大规模计算的硬件只有CPU和GPU。为了程序设计规范，我们设置一个指定设备的开关：

use_cuda = False
place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()

这样我们就可以将Trainer实例化部分给写出来了：

use_cuda = False
place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
trainer = fluid.Trainer(train_func=train_program,optimizer_func=optimizer_program,place=place)

六、读到这里是不是想赶紧运行一下trainer.train()赶紧让他先跑起来？且慢，我们发现，我们只定义了数据的容器（杯子），且从没有处理让真正数据读进网络的代码（水）。所以我们还需要写一个数据读取、预处理的模块。因为图像识别的网络都是将图片一批一批训练的，所以显而易见我们需要数据读入和分批这两个操作。PaddlePaddle在daraset包里存放个各种各样公开数据库的API，一句话就可以调用这些数据包，并返回Python reader格式的数据。所以我们的流程是先将数据从API中读出来，然后做乱序处理，之后用batch函数进行分批操作（在这里指定BATCH_SIZE超参数）：

# Each batch will yield 128 images
BATCH_SIZE = 128# Reader for training
train_reader = paddle.batch(paddle.reader.shuffle(paddle.dataset.cifar.train10(), buf_size=50000),batch_size=BATCH_SIZE)# Reader for testing. A separated data set for testing.
test_reader = paddle.batch(paddle.dataset.cifar.test10(), batch_size=BATCH_SIZE)

七、是不是觉得有了数据就可以跑起来了？Naive~ 在trainer中，必须指定一个事件处理函数才可以运行。这个函数的作用是观察、调试参数，保存参数模型。这里我们用画图的方式来观察网络中cost参数的变化过程：

params_dirname = "image_classification_resnet.inference.model"from paddle.v2.plot import Plotertrain_title = "Train cost"
test_title = "Test cost"
cost_ploter = Ploter(train_title, test_title)step = 0
def event_handler_plot(event):global stepif isinstance(event, fluid.EndStepEvent):cost_ploter.append(train_title, step, event.metrics[0])cost_ploter.plot()step += 1if isinstance(event, fluid.EndEpochEvent):avg_cost, accuracy = trainer.test(reader=test_reader,feed_order=['pixel', 'label'])cost_ploter.append(test_title, step, avg_cost)# save parametersif params_dirname is not None:trainer.save_params(params_dirname)

使用paddle.v2.plot可以轻松在ipython notebook中将参数点在图像中画出来，核心代码只有一个 cost_ploter.append(）。本块代码的核心是定义一个事件处理函数event_handler_plot。在这个函数中，对每批次训练和每一轮训练进行不同的操作。在每轮结束后，event接口会收到fluid.EndStepEvent类的对象。在批次训练完成后event会收到fluid.EndEpochEvent类的对象，通过isinstance方法可以判断event是哪个事件对象的实例。最后在完成每一轮数据的训练时，我们将模型保存在第一行指定的地址。
八、现在可以执行训练了，启动trainer.train方法需要指定4个必要参数：reader（Python reader格式的数据流）、num_epochs（对数据集训练轮次的超参数）、event_handler（事件处理函数）、feed_order（存放训练数据和标签的容器）。所以代码如下：

trainer.train(reader=train_reader,num_epochs=2,event_handler=event_handler_plot,feed_order=['pixel', 'label'])

等待一会就可以看到输出不断变化的cost值了：
横坐标为训练批次数，纵坐标为cost的批次均值

九、在训练一轮数据集之后，模型便保存在我们制定的路径中了。那如何使用这个模型来进行预测呢？PaddlePaddle的预测代码很简单，先实例化一个预测引擎inferencer = fluid.Inferencer(），然后使用inferencer.infer()启动引擎就可以了。所以我们想一下，预测引擎启动前需要哪些参数呢？首先，推理程序是必不可少的，我们就使用之前写的inference_program就可以，其次还要指定刚刚保存的模型存放路径params_dirname。最后和trainer一样，要指定一下计算运行的设备place。所以代码如下：

inferencer = fluid.Inferencer(infer_func=inference_program, param_path=params_dirname, place=place)
results = inferencer.infer({'pixel': img})

我们从results中取到的是由每一分类的似然值构成的List。ifar.train10是10分类数据，所以我们将这个分类的名称用人类的语言来描述一下：

label_list = ["airplane", "automobile", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck"]

然后result这个List中用np.argmax取最大概率的位置值，将其对应的label输出就得到了我们想要的分类了：

print("infer results: %s" % label_list[np.argmax(results[0])])

你以为大功告成了吗？仔细看一下results = inferencer.infer({'pixel': img})发现我们还没有对要预测的图像进行处理、定义。要符合PaddlePaddle的格式图像应为CHW格式(通道、高度、宽度），每个像素的颜色表示应在[-1,1]的闭区间内。所以我们使用Python内置的PIL包来处理读入的图像：

# Prepare testing data.
from PIL import Image
import numpy as np
import osdef load_image(file):im = Image.open(file)im = im.resize((32, 32), Image.ANTIALIAS)im = np.array(im).astype(np.float32)#浮点精度转换im = im.transpose((2, 0, 1))  # 转为CHW顺序im = im / 255.0 #归一化在[-1,1]的区间内# Add one dimension to mimic the list format.im = numpy.expand_dims(im, axis=0)return imcur_dir = os.getcwd()#拼接为绝对地址
img = load_image( './image/dog.png')#要预测图像的地址

所以我们的整段预测代码为：

# Prepare testing data.
from PIL import Image
import numpy as np
import osdef load_image(file):im = Image.open(file)im = im.resize((32, 32), Image.ANTIALIAS)im = np.array(im).astype(np.float32)# The storage order of the loaded image is W(width),# H(height), C(channel). PaddlePaddle requires# the CHW order, so transpose them.im = im.transpose((2, 0, 1))  # CHWim = im / 255.0 #-1 - 1# Add one dimension to mimic the list format.im = numpy.expand_dims(im, axis=0)return imcur_dir = os.getcwd()
img = load_image(cur_dir + '/03.image_classification/image/dog.png')
#img = load_image( './image/dog.png')inferencer = fluid.Inferencer(infer_func=inference_program, param_path=params_dirname, place=place)label_list = ["airplane", "automobile", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck"]
# inference
results = inferencer.infer({'pixel': img})
#print(results[0])
print("infer results: %s" % label_list[np.argmax(results[0])+1])

十、附整个过程的完整代码：

import paddle
import paddle.fluid as fluid
import numpy
import sys
from __future__ import print_functiondef vgg_bn_drop(input):def conv_block(ipt, num_filter, groups, dropouts):return fluid.nets.img_conv_group(input=ipt,pool_size=2,pool_stride=2,conv_num_filter=[num_filter] * groups,conv_filter_size=3,conv_act='relu',conv_with_batchnorm=True,conv_batchnorm_drop_rate=dropouts,pool_type='max')conv1 = conv_block(input, 64, 2, [0.3, 0])conv2 = conv_block(conv1, 128, 2, [0.4, 0])conv3 = conv_block(conv2, 256, 3, [0.4, 0.4, 0])conv4 = conv_block(conv3, 512, 3, [0.4, 0.4, 0])conv5 = conv_block(conv4, 512, 3, [0.4, 0.4, 0])drop = fluid.layers.dropout(x=conv5, dropout_prob=0.5)fc1 = fluid.layers.fc(input=drop, size=512, act=None)bn = fluid.layers.batch_norm(input=fc1, act='relu')drop2 = fluid.layers.dropout(x=bn, dropout_prob=0.5)fc2 = fluid.layers.fc(input=drop2, size=512, act=None)predict = fluid.layers.fc(input=fc2, size=10, act='softmax')return predictdef inference_program():# The image is 32 * 32 with RGB representation.data_shape = [3, 32, 32]images = fluid.layers.data(name='pixel', shape=data_shape, dtype='float32')#predict = resnet_cifar10(images, 32)predict = vgg_bn_drop(images) # un-comment to use vgg netreturn predictdef train_program():predict = inference_program()label = fluid.layers.data(name='label', shape=[1], dtype='int64')cost = fluid.layers.cross_entropy(input=predict, label=label)avg_cost = fluid.layers.mean(cost)accuracy = fluid.layers.accuracy(input=predict, label=label)return [avg_cost, accuracy]def optimizer_program():return fluid.optimizer.Adam(learning_rate=0.001)use_cuda = False
place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
trainer = fluid.Trainer(train_func=train_program,optimizer_func=optimizer_program,place=place)# Each batch will yield 128 images
BATCH_SIZE = 128# Reader for training
train_reader = paddle.batch(paddle.reader.shuffle(paddle.dataset.cifar.train10(), buf_size=50000),batch_size=BATCH_SIZE)# Reader for testing. A separated data set for testing.
test_reader = paddle.batch(paddle.dataset.cifar.test10(), batch_size=BATCH_SIZE)params_dirname = "image_classification_resnet.inference.model"from paddle.v2.plot import Plotertrain_title = "Train cost"
test_title = "Test cost"
cost_ploter = Ploter(train_title, test_title)step = 0
def event_handler_plot(event):global stepif isinstance(event, fluid.EndStepEvent):cost_ploter.append(train_title, step, event.metrics[0])cost_ploter.plot()step += 1if isinstance(event, fluid.EndEpochEvent):avg_cost, accuracy = trainer.test(reader=test_reader,feed_order=['pixel', 'label'])cost_ploter.append(test_title, step, avg_cost)# save parametersif params_dirname is not None:trainer.save_params(params_dirname)trainer.train(reader=train_reader,num_epochs=2,event_handler=event_handler_plot,feed_order=['pixel', 'label'])# Prepare testing data.
from PIL import Image
import numpy as np
import osdef load_image(file):im = Image.open(file)im = im.resize((32, 32), Image.ANTIALIAS)im = np.array(im).astype(np.float32)# The storage order of the loaded image is W(width),# H(height), C(channel). PaddlePaddle requires# the CHW order, so transpose them.im = im.transpose((2, 0, 1))  # CHWim = im / 255.0 #-1 - 1# Add one dimension to mimic the list format.im = numpy.expand_dims(im, axis=0)return imcur_dir = os.getcwd()
img = load_image(cur_dir + '/03.image_classification/image/dog.png')
#img = load_image( './image/dog.png')inferencer = fluid.Inferencer(infer_func=inference_program, param_path=params_dirname, place=place)label_list = ["airplane", "automobile", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck"]
# inference
results = inferencer.infer({'pixel': img})
#print(results[0])
print("infer results: %s" % label_list[np.argmax(results[0])+1])

这篇关于100行代码入门PaddlePaddle图像识别（无痛看代码）的文章就介绍到这儿，希望我们推荐的文章对编程师们有所帮助！