本文主要是介绍深度学习——基于MTCNN算法实现人脸侦测,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
这里写目录标题
- 先看效果
- MTCNN
- 主体思想
- 级联网络
- 图像金字塔
- IOU算法
- iou 公式
- nms 算法
- 数据生成celeba
- 数据代码
- 训练代码
- 侦测代码
- 总结
先看效果
MTCNN
从2016年,MTCNN算法出来之后,属实在工业上火了一把,最近尝试着把论文代码复现了一下。
主体思想
级联网络
**
这篇论文属于一篇多任务级联卷积神经网络,如图,利用P、R、O 三个网络来进行检测。
算法步骤:
- 将传入p网络中,预选一些框,生成特征图
- 将p网络中的预选框传入R网络中进行进一步筛选
- 将R网络中的预选框传入O网络中进行进一步筛选
- O网络进行最终判定,输出人脸框。
上述步骤在图像金字塔的总框架下实现,为实现不同尺度下的人脸侦测,采用图像金字塔算法,将不同尺度的人脸传入网络中,虽然这样有可能让模型准确率更高,但是这样也导致模型运算的次数增加。
P网络:
由于P网络未知一张图片含有多少人脸,所以使用全卷积网络接收任意大小的图片,将通道数作为输出特征数,使用图像金字塔缩放图片(最短边长大于12),并设卷积核步长为1不放过任何一个可能性。
采用12x12的大核决定网络侦测到的最小人脸。以固定大小框切取同样大小图片(输出固定)。
使用卷积网络实现切割12x12图片,通过多层小卷积核神经网络的特征提取等价于大核单层神经网络(感受野相同),抽象能力更强,特征提取更精细,参数更少,网络运行更快。使用偏移量的大小作为训练损失,使得网络计算更方便;。
对不同任务设置不同的损失函数,增强学习的针对性。增加人脸五官特征任务,增加网络学习任务的复杂性,清晰网络学习的方向。每增加一个损失,会提升神经网络特征提取的能力,促进网络模型优化。
R网络:
通过多层小卷积核神经网络的特征提取等价于大核单层神经网络(感受野相同)。
由于输入为24 x 24固定大小的图片,输出则可使用全连接提取特征(在此处使用卷积核提取特征相同)。
O网络:
通过多层小卷积核神经网络的特征提取等价于大核单层神经网络(感受野相同)。
由于输入为48 x 28固定大小的图片,输出则可使用全连接提取特征(在此处使用卷积核提取特征相同)。
图像金字塔
一张图片上会存在着多个目标,且目标的大小不一样,模型对于一些过小或者过大的目标都无法检测,需要强调一些多尺度信息。MTCNN中对于图像的缩放因子为0.703左右。
另外不得不提的两个算法,IOU 和NMS.
IOU算法
MTCNN制作数据集样本要求:
IOU范围 样本类型 标签
0 ~ 0.3 非人脸 置信度为0(不参与计算偏移量损失)
0.3 ~ 0.4 地标(缓冲区,防止误判)
0.4 ~ 0.65 Part人脸 置信度为2(不参与计算置信度损失)
0.65 ~ 1 人脸 置信度为1
IoU是一种测量在特定数据集中检测相应物体准确度的一个标准。IoU是一个简单的测量标准,只要是在输出中得出一个预测范围(bounding boxex)的任务都可以用IoU来进行测量。为了可以使IoU用于测量任意大小形状的物体检测,我们需要:人为在训练集图像中标出要检测物体的大概范围。也就是先验框的说法。
也就是说,这个标准用于测量真实和预测之间的相关度,相关度越高,该值越高。
在FasterRcnn也提出,相近的框近似于线性变换关系,利用于此,学习先验框与实际框之间的映射关系,比直接学习目标的框更为容易。
iou 公式
反应两个框之间的交并比关系,IOU越大,代表重合程度更大.
def iou(box,boxes,isMin=False):box_area = (box[3]-box[1])*(box[2]-box[0])boxes_area = (boxes[:,3]-boxes[:,1])*(boxes[:,2]-boxes[:,0])xx1 = np.maximum(box[0],boxes[:,0])yy1 = np.maximum(box[1],boxes[:,1])xx2 = np.minimum(box[2],boxes[:,2])yy2 = np.minimum(box[3],boxes[:,3])w = np.maximum(0, (xx2-xx1))h = np.maximum(0,(yy2-yy1))area =w*hif isMin:return np.true_divide(area,np.minimum(box_area,boxes_area))else:return np.true_divide(area,box_area+boxes_area-area)
nms 算法
-直接问题就是为了解决多个输出框的问题
- 选取最大置信度的框
- 剩余的框与最大置信度框进行iou 交并比计算
- 把iou大于某个阈值的框进行过滤,这样可以去除掉大量的重复框
代码实现如下:
def nms(boxes, thresh=0.3, isMin = False):#框的长度为0时(防止程序有缺陷报错)if boxes.shape[0] == 0:return np.array([])#框的长度不为0时#根据置信度排序:[x1,y1,x2,y2,C]_boxes = boxes[(-boxes[:, 4]).argsort()] # #根据置信度“由大到小”,默认有小到大(加符号可反向排序)#创建空列表,存放保留剩余的框r_boxes = []# 用1st个框,与其余的框进行比较,当长度小于等于1时停止(比len(_boxes)-1次)while _boxes.shape[0] > 1: #shape[0]等价于shape(0),代表0轴上框的个数(维数)#取出第1个框a_box = _boxes[0]#取出剩余的框b_boxes = _boxes[1:]#将1st个框加入列表r_boxes.append(a_box) ##每循环一次往,添加一个框# print(iou(a_box, b_boxes))#比较IOU,将符合阈值条件的的框保留下来index = np.where(iou(a_box, b_boxes,isMin) < thresh) #将阈值小于0.3的建议框保留下来,返回保留框的索引_boxes = b_boxes[index] #循环控制条件;取出阈值小于0.3的建议框if _boxes.shape[0] > 0: ##最后一次,结果只用1st个符合或只有一个符合,若框的个数大于1;★此处_boxes调用的是whilex循环里的,此判断条件放在循环里和外都可以(只有在函数类外才可产生局部作用于)r_boxes.append(_boxes[0]) #将此框添加到列表中#stack组装为矩阵::将列表中的数据在0轴上堆叠(行方向)return np.stack(r_boxes)
上述讲完,基本知识已经完毕,剩下实操代码:
网络模型代码实现:
这里我实现了修改:
- 采用了BN,方便训练模型更快的收敛。
- 将每个模型中的最大池化全部转化为卷积步长=2的方式(数据量基本足够,所以不用担心过拟合,但是这样会导致计算量的增大),改了效果确实好了不少。
from torch import nn
import torchclass PNet(nn.Module):def __init__(self):super(PNet,self).__init__()self.name = "pNet"self.pre_layer = nn.Sequential(nn.Conv2d(in_channels=3,out_channels=10,kernel_size=(3,3),stride=(1,1),padding=(1,1)), # conv1nn.PReLU(),# prelu1nn.Conv2d(in_channels=10,out_channels=10,kernel_size=(3,3),stride=(2,2)), nn.Conv2d(10,16,kernel_size=(3,3),stride=(1,1)), # conv2nn.PReLU(), # prelu2nn.Conv2d(16,32,kernel_size=(3,3),stride=(1,1)), # conv3nn.PReLU() # prelu3)self.conv4_1 = nn.Conv2d(32,1,kernel_size=(1,1),stride=(1,1))self.conv4_2 = nn.Conv2d(32,4,kernel_size=(1,1),stride=(1,1))def forward(self, x):x = self.pre_layer(x)cond = torch.sigmoid(self.conv4_1(x)) # 置信度用sigmoid激活(用BCEloos时先要用sigmoid激活)offset = self.conv4_2(x) # 偏移量不需要激活,原样输出return cond,offset
R网络
# R网路
class RNet(nn.Module):def __init__(self):super(RNet,self).__init__()self.name = "RNet"self.pre_layer = nn.Sequential(nn.Conv2d(in_channels=3, out_channels=28, kernel_size=(3,3),stride=(1,1),padding=(1,1)), # conv1nn.BatchNorm2d(28),nn.PReLU(), # prelu1nn.Conv2d(in_channels=28, out_channels=28, kernel_size=(3, 3), stride=(2, 2)), nn.BatchNorm2d(28),nn.PReLU(),# pool1nn.Conv2d(28, 48, kernel_size=(3,3),stride=(1,1)), # conv2nn.BatchNorm2d(48),nn.PReLU(), # prelu2nn.Conv2d(in_channels=48, out_channels=48, kernel_size=(3, 3), stride=(2, 2)), nn.BatchNorm2d(48),nn.PReLU(),nn.Conv2d(48, 64, kernel_size=(2,2), stride=(1,1)), # conv3nn.BatchNorm2d(64),nn.PReLU() # prelu3)self.conv4 = nn.Linear(64*3*3,128) # conv4self.prelu4 = nn.PReLU() # prelu4#detetionself.conv5_1 = nn.Linear(128,1)#bounding box regressionself.conv5_2 = nn.Linear(128, 4)def forward(self, x):#backendx = self.pre_layer(x)x = x.view(x.size(0),-1)x = self.conv4(x)x = self.prelu4(x)#detectionlabel = torch.sigmoid(self.conv5_1(x)) # 置信度offset = self.conv5_2(x) # 偏移量return label,offset
数据转化代码:这里采用的是论文中的数据celeba 和widerface
先把celeba的数据中的box和landmarks集中一起
landmarks_path = r"F:\CelebA\Anno\list_landmarks_align_celeba.txt"bbox_path = r"F:\CelebA\Anno\list_bbox_celeba.txt"save_path = "anno.txt"with open(landmarks_path,"r") as f:landmarks = f.readlines()with open(bbox_path,"r") as f:bbox = f.readlines()with open(save_path,"w") as f:for i,(line1,line2) in enumerate(zip(bbox,landmarks)):if i<1:f.write(line1)elif i==1:strs = line1.strip()+" "+ line2f.write(strs)# f.write(line2)else:strs = line1.strip().split()+line2.strip().split()[1:]strs = " ".join(strs)+"\n"f.write(strs)
O网路的实现:
# O网路
class ONet(nn.Module):def __init__(self):super(ONet,self).__init__()self.name = "oNet"# backendself.pre_layer = nn.Sequential(nn.Conv2d(3, 32, kernel_size=(3,3), stride=(1,1),padding=(1,1)), # conv1nn.BatchNorm2d(32),nn.PReLU(), # prelu1nn.Conv2d(in_channels=32, out_channels=32, kernel_size=(3, 3), stride=(2, 2)), nn.BatchNorm2d(32),nn.PReLU(),nn.Conv2d(32, 64,kernel_size=(3,3), stride=(1,1)), # conv2nn.BatchNorm2d(64),nn.PReLU(), # prelu2nn.Conv2d(in_channels=64, out_channels=64, kernel_size=(3, 3), stride=(2, 2)), nn.BatchNorm2d(64),nn.PReLU(), # prelu2nn.Conv2d(64, 64, kernel_size=(3,3), stride=(1,1)), # conv3nn.BatchNorm2d(64),nn.PReLU(), # prelu3nn.Conv2d(in_channels=64, out_channels=64, kernel_size=(2, 2), stride=(2, 2)), nn.BatchNorm2d(64),nn.PReLU(), # prelu3nn.Conv2d(64, 128,kernel_size=(2,2), stride=(1,1)), # conv4nn.PReLU(), # prelu4)self.conv5 = nn.Linear(128 * 3 * 3, 256) # conv5self.prelu5 = nn.PReLU() # prelu5# detectionself.conv6_1 = nn.Linear(256, 1)# bounding box regressionself.conv6_2 = nn.Linear(256, 4)def forward(self, x):# backendx = self.pre_layer(x)x = x.reshape(x.size(0), -1)x = self.conv5(x)x = self.prelu5(x)# detectionlabel = torch.sigmoid(self.conv6_1(x)) # 置信度offset = self.conv6_2(x) # 偏移量return label, offsetif __name__ == '__main__':x = torch.randn(2,3,12,12)x2 = torch.randn(2,3,24,24)x3 = torch.randn(2,3,48,48)model1 = PNet()print(model1(x)[0].shape)print(model1(x)[1].shape)model2 = RNet()print(model2(x2)[0].shape)print(model2(x2)[1].shape)model3 = ONet()print(model3(x3)[0].shape)print(model3(x3)[1].shape)print(model1)print(model2)print(model3)print(list(model1.pre_layer[0].weight))
数据生成celeba
import os
import tracebackimport numpy as np
from PIL import Image, ImageDrawfrom tools import utilsclass GenerateData():def __init__(self, anno_src=r"anno.txt",imgs_path=r"F:\CelebA\Img\img_celeba\img_celeba",save_path="D:\DataSet"):self.anno_src = anno_srcself.imgs_path = imgs_pathself.save_path = save_pathif not os.path.exists(self.save_path):os.makedirs(self.save_path)def run(self, size=12):print("gen %i image" % size) # %i:十进制数占位符for face_size in [size]:# “样本图片”存储路径--imagepositive_image_dir = os.path.join(self.save_path, str(face_size), "positive") # 三级文件路径negative_image_dir = os.path.join(self.save_path, str(face_size), "negative")part_image_dir = os.path.join(self.save_path, str(face_size), "part")print(positive_image_dir, negative_image_dir, part_image_dir)for dir_path in [positive_image_dir, negative_image_dir, part_image_dir]:if not os.path.exists(dir_path): # 如果文件不存在则创建文件路径os.makedirs(dir_path)# “样本标签”存储路径--textpositive_anno_filename = os.path.join(self.save_path, str(face_size), "positive.txt") # 创建正样本txt文件negative_anno_filename = os.path.join(self.save_path, str(face_size), "negative.txt")part_anno_filename = os.path.join(self.save_path, str(face_size), "part.txt")# 计数初始值:给文件命名positive_count = 0 # 计数器初始值negative_count = 0part_count = 0# 凡是文件操作,最好try一下,防止程序出错奔溃try:positive_anno_file = open(positive_anno_filename, "w") # 以写入的模式打开txt文档negative_anno_file = open(negative_anno_filename, "w")part_anno_file = open(part_anno_filename, "w")for i, line in enumerate(open(self.anno_src)): # 枚举出所有信息if i < 2:continue # i小于2时继续读文件readlines# print(i,line)try:# print(line)strs = line.strip().split(" ") # strip删除两边的空格# print(strs)# print(strs)image_filename = strs[0].strip()# print(image_filename)image_file = os.path.join(self.imgs_path, image_filename) # 创建文件绝对路径with Image.open(image_file) as img:img_w, img_h = img.sizex1 = float(strs[1].strip()) # 取2nd个值去除两边的空格,再转车float型y1 = float(strs[2].strip())w = float(strs[3].strip())h = float(strs[4].strip())x2 = float(x1 + w)y2 = float(y1 + h)px1 = float(strs[5].strip()) # 人的五官py1 = float(strs[6].strip())px2 = float(strs[7].strip())py2 = float(strs[8].strip())px3 = float(strs[9].strip())py3 = float(strs[10].strip())px4 = float(strs[11].strip())py4 = float(strs[12].strip())px5 = float(strs[13].strip())py5 = float(strs[14].strip())# 过滤字段,去除不符合条件的坐标if max(w, h) < 40 or x1 < 0 or y1 < 0 or w < 0 or h < 0:continue# 标注不太标准:给人脸框与适当的偏移★x1 = int(x1 + w * 0.12) # 原来的坐标给与适当的偏移:偏移人脸框的0.15倍y1 = int(y1 + h * 0.1)x2 = int(x1 + w * 0.9)y2 = int(y1 + h * 0.85)w = int(x2 - x1) # 偏移后框的实际宽度h = int(y2 - y1)boxes = [[x1, y1, x2, y2]] # 左上角和右下角四个坐标点;二维的框有批次概念# draw = ImageDraw.Draw(img)# draw.rectangle(boxes[0],)# img.show()# 计算出人脸中心点位置:框的中心位置cx = x1 + w / 2cy = y1 + h / 2# 使正样本和部分样本数量翻倍以图片中心点随机偏移for _ in range(2): # 每个循环5次,画五个框框、抠出来# 让人脸中心点有少许的偏移w_ = np.random.randint(-w * 0.1, w * 0.1) # 框的横向偏移范围:向左、向右移动了20%h_ = np.random.randint(-h * 0.1, h * 0.1)cx_ = cx + w_cy_ = cy + h_# 让人脸形成正方形(12*12,24*24,48*48),并且让坐标也有少许的偏离side_len = np.random.randint(int(min(w, h) * 0.8), np.ceil(1.25 * max(w, h)))# 边长偏移的随机数的范围;ceil大于等于该值的最小整数(向上取整);原0.8x1_ = np.max(cx_ - side_len / 2, 0) # 坐标点随机偏移y1_ = np.max(cy_ - side_len / 2, 0)x2_ = x1_ + side_leny2_ = y1_ + side_lencrop_box = np.array([x1_, y1_, x2_, y2_]) # 偏移后的新框# draw.rectangle(list(crop_box))# 计算坐标的偏移值offset_x1 = (x1 - x1_) / side_len # 偏移量△δ=(x1-x1_)/side_len;新框的宽度;offset_y1 = (y1 - y1_) / side_lenoffset_x2 = (x2 - x2_) / side_lenoffset_y2 = (y2 - y2_) / side_lenoffset_px1 = (px1 - x1_) / side_len # 人的五官特征的偏移值offset_py1 = (py1 - y1_) / side_lenoffset_px2 = (px2 - x1_) / side_lenoffset_py2 = (py2 - y1_) / side_lenoffset_px3 = (px3 - x1_) / side_lenoffset_py3 = (py3 - y1_) / side_lenoffset_px4 = (px4 - x1_) / side_lenoffset_py4 = (py4 - y1_) / side_lenoffset_px5 = (px5 - x1_) / side_lenoffset_py5 = (py5 - y1_) / side_len# 剪切下图片,并进行大小缩放face_crop = img.crop(crop_box) # “抠图”,crop剪下框出的图像face_resize = face_crop.resize((face_size, face_size),Image.ANTIALIAS) # ★按照人脸尺寸(“像素矩阵大小”)进行缩放:12/24/48;坐标没放缩iou = utils.iou(crop_box, np.array(boxes))[0] # 抠出来的框和原来的框计算IOUif iou > 0.65: # 正样本;原为0.65positive_anno_file.write("positive/{0}.jpg {1} {2} {3} {4} {5} {6} {7} {8} {9} {10} {11} {12} {13} {14} {15}\n".format(positive_count, 1, offset_x1, offset_y1,offset_x2, offset_y2, offset_px1, offset_py1, offset_px2, offset_py2,offset_px3,offset_py3, offset_px4, offset_py4, offset_px5, offset_py5))positive_anno_file.flush() # flush:将缓存区的数据写入文件face_resize.save(os.path.join(positive_image_dir, "{0}.jpg".format(positive_count))) # 保存positive_count += 1elif iou > 0.4: # 部分样本;原为0.4part_anno_file.write("part/{0}.jpg {1} {2} {3} {4} {5} {6} {7} {8} {9} {10} {11} {12} {13} {14} {15}\n".format(part_count, 2, offset_x1, offset_y1, offset_x2,offset_y2, offset_px1, offset_py1, offset_px2, offset_py2, offset_px3,offset_py3, offset_px4, offset_py4, offset_px5, offset_py5)) # 写入txt文件part_anno_file.flush()face_resize.save(os.path.join(part_image_dir, "{0}.jpg".format(part_count)))part_count += 1elif iou < 0.29: # ★这样生成的负样本很少;原为0.3negative_anno_file.write("negative/{0}.jpg {1} 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n".format(negative_count, 0))negative_anno_file.flush()face_resize.save(os.path.join(negative_image_dir, "{0}.jpg".format(negative_count)))negative_count += 1# 生成负样本_boxes = np.array(boxes)for i in range(2): # 数量一般和前面保持一样side_len = np.random.randint(face_size, min(img_w, img_h) / 2)x_ = np.random.randint(0, img_w - side_len)y_ = np.random.randint(0, img_h - side_len)crop_box = np.array([x_, y_, x_ + side_len, y_ + side_len])if np.max(utils.iou(crop_box, _boxes)) < 0.29: # 在加IOU进行判断:保留小于0.3的那一部分;原为0.3face_crop = img.crop(crop_box) # 抠图face_resize = face_crop.resize((face_size, face_size),Image.ANTIALIAS) # ANTIALIAS:平滑,抗锯齿negative_anno_file.write("negative/{0}.jpg {1} 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n".format(negative_count, 0))negative_anno_file.flush()face_resize.save(os.path.join(negative_image_dir, "{0}.jpg".format(negative_count)))negative_count += 1except Exception as e:print(e)traceback.print_exc()except Exception as e:print(e)# 关闭写入文件finally:positive_anno_file.close() # 关闭正样本txt件negative_anno_file.close()part_anno_file.close()if __name__ == '__main__':data = GenerateData()data.run(size=12)data.run(size=24)data.run(size=48)
widerface 数据代码
import os
import tracebackimport numpy as np
from PIL import Image, ImageDrawfrom tools import utilsdef gentxt():imgs_path = r"F:\widerface\WIDER_train\images"bbox_txt = r"F:\widerface\wider_face_split\wider_face_train_bbx_gt.txt"with open(bbox_txt,"r") as f:data = f.readlines()empty_dict = {}temp_name = Nonefor i,line in enumerate(data):# i = i.strip()if line.strip().endswith("jpg"):line = line.strip()empty_dict[line] = []temp_name = lineelse:line = line.strip()if len(line)>10:# print(line.split()[:4])empty_dict[temp_name].append(line.split()[:4])with open("wider_anno.txt","w") as f:for key in empty_dict.keys():values = empty_dict[key]f.write(f"{key} ")for value in values:# print(value)# print(" ".join(value))f.write(" ".join(value))f.write(" ")f.write("\n")# exit()# f.write(f"{key}",)class GenerateData():def __init__(self, anno_src=r"wider_anno.txt",imgs_path=r"F:\widerface\WIDER_train\images",save_path="D:\DataSet\wider"):self.anno_src = anno_srcself.imgs_path = imgs_pathself.save_path = save_pathif not os.path.exists(self.save_path):os.makedirs(self.save_path)def run(self, size=12):print("gen %i image" % size) # %i:十进制数占位符for face_size in [size]:# “样本图片”存储路径--imagepositive_image_dir = os.path.join(self.save_path, str(face_size), "positive") # 三级文件路径negative_image_dir = os.path.join(self.save_path, str(face_size), "negative")part_image_dir = os.path.join(self.save_path, str(face_size), "part")print(positive_image_dir, negative_image_dir, part_image_dir)for dir_path in [positive_image_dir, negative_image_dir, part_image_dir]:if not os.path.exists(dir_path): # 如果文件不存在则创建文件路径os.makedirs(dir_path)# “样本标签”存储路径--textpositive_anno_filename = os.path.join(self.save_path, str(face_size), "positive.txt") # 创建正样本txt文件negative_anno_filename = os.path.join(self.save_path, str(face_size), "negative.txt")part_anno_filename = os.path.join(self.save_path, str(face_size), "part.txt")# 计数初始值:给文件命名positive_count = 0 # 计数器初始值negative_count = 0part_count = 0# 凡是文件操作,最好try一下,防止程序出错奔溃try:positive_anno_file = open(positive_anno_filename, "w") # 以写入的模式打开txt文档negative_anno_file = open(negative_anno_filename, "w")part_anno_file = open(part_anno_filename, "w")for i, line in enumerate(open(self.anno_src)): # 枚举出所有信息try:# print(line)strs = line.strip().split(" ") # strip删除两边的空格# print(strs)# print(strs)image_filename = strs[0].strip()# print(image_filename)image_file = os.path.join(self.imgs_path, image_filename) # 创建文件绝对路径values = list(map(float, strs[1:]))all_boxes = []for index in range(0, len(values), 4):all_boxes.append(values[index:index + 4])with Image.open(image_file) as img:for one_box in all_boxes:img_w, img_h = img.sizex1 = one_box[0]# float(strs[1].strip()) # 取2nd个值去除两边的空格,再转车float型y1 = one_box[1] # float(strs[2].strip())w =one_box[2]# float(strs[3].strip())h =one_box[3]# float(strs[4].strip())x2 = float(x1 + w)y2 = float(y1 + h)# draw = ImageDraw.Draw(img)# draw.rectangle([x1,y1,x2,y2])# img.show()# exit()px1 = 0#float(strs[5].strip()) # 人的五官py1 = 0#float(strs[6].strip())px2 =0# float(strs[7].strip())py2 =0# float(strs[8].strip())px3 =0# float(strs[9].strip())py3 =0# float(strs[10].strip())px4 =0# float(strs[11].strip())py4 =0# float(strs[12].strip())px5 =0# float(strs[13].strip())py5 =0# float(strs[14].strip())# 过滤字段,去除不符合条件的坐标if max(w, h) < 40 or x1 < 0 or y1 < 0 or w < 0 or h < 0:continue# x1 = int(x1 + w) # 原来的坐标给与适当的偏移:偏移人脸框的0.15倍# y1 = int(y1 + h)# x2 = int(x1 + w)# y2 = int(y1 + h)# w = int(x2 - x1) # 偏移后框的实际宽度# h = int(y2 - y1)boxes = [[x1, y1, x2, y2]] # 左上角和右下角四个坐标点;二维的框有批次概念# print(boxes)# # exit()# 计算出人脸中心点位置:框的中心位置cx = x1 + w / 2cy = y1 + h / 2# 使正样本和部分样本数量翻倍以图片中心点随机偏移for _ in range(1): # 每个循环5次,画五个框框、抠出来# 让人脸中心点有少许的偏移# print(-w * 0.2, w * 0.2)w_ = np.random.randint(-w * 0.2, w * 0.2) # 框的横向偏移范围:向左、向右移动了20%h_ = np.random.randint(-h * 0.2, h * 0.2)cx_ = cx + w_cy_ = cy + h_# 让人脸形成正方形(12*12,24*24,48*48),并且让坐标也有少许的偏离side_len = np.random.randint(int(min(w, h) * 0.8), np.ceil(1.25 * max(w, h)))# 边长偏移的随机数的范围;ceil大于等于该值的最小整数(向上取整);原0.8x1_ = np.max(cx_ - side_len / 2, 0) # 坐标点随机偏移y1_ = np.max(cy_ - side_len / 2, 0)x2_ = x1_ + side_leny2_ = y1_ + side_lencrop_box = np.array([x1_, y1_, x2_, y2_]) # 偏移后的新框# draw.rectangle(list(crop_box))# img.show()# exit()# 计算坐标的偏移值offset_x1 = (x1 - x1_) / side_len # 偏移量△δ=(x1-x1_)/side_len;新框的宽度;offset_y1 = (y1 - y1_) / side_lenoffset_x2 = (x2 - x2_) / side_lenoffset_y2 = (y2 - y2_) / side_lenoffset_px1 =0# (px1 - x1_) / side_len # 人的五官特征的偏移值offset_py1 =0 # (py1 - y1_) / side_lenoffset_px2 =0 #(px2 - x1_) / side_lenoffset_py2 =0# (py2 - y1_) / side_lenoffset_px3 =0# (px3 - x1_) / side_lenoffset_py3 = 0#(py3 - y1_) / side_lenoffset_px4 =0# (px4 - x1_) / side_lenoffset_py4 = 0#(py4 - y1_) / side_lenoffset_px5 = 0#(px5 - x1_) / side_lenoffset_py5 = 0#(py5 - y1_) / side_len# 剪切下图片,并进行大小缩放face_crop = img.crop(crop_box) # “抠图”,crop剪下框出的图像face_resize = face_crop.resize((face_size, face_size),Image.ANTIALIAS) # ★按照人脸尺寸(“像素矩阵大小”)进行缩放:12/24/48;坐标没放缩iou = utils.iou(crop_box, np.array(boxes))[0] # 抠出来的框和原来的框计算IOUif iou > 0.65: # 正样本;原为0.65positive_anno_file.write("positive/{0}.jpg {1} {2} {3} {4} {5} {6} {7} {8} {9} {10} {11} {12} {13} {14} {15}\n".format(positive_count, 1, offset_x1, offset_y1,offset_x2, offset_y2, offset_px1, offset_py1, offset_px2, offset_py2,offset_px3,offset_py3, offset_px4, offset_py4, offset_px5, offset_py5))positive_anno_file.flush() # flush:将缓存区的数据写入文件face_resize.save(os.path.join(positive_image_dir, "{0}.jpg".format(positive_count))) # 保存positive_count += 1elif iou > 0.4: # 部分样本;原为0.4part_anno_file.write("part/{0}.jpg {1} {2} {3} {4} {5} {6} {7} {8} {9} {10} {11} {12} {13} {14} {15}\n".format(part_count, 2, offset_x1, offset_y1, offset_x2,offset_y2, offset_px1, offset_py1, offset_px2, offset_py2, offset_px3,offset_py3, offset_px4, offset_py4, offset_px5, offset_py5)) # 写入txt文件part_anno_file.flush()face_resize.save(os.path.join(part_image_dir, "{0}.jpg".format(part_count)))part_count += 1elif iou < 0.29: # ★这样生成的负样本很少;原为0.3negative_anno_file.write("negative/{0}.jpg {1} 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n".format(negative_count, 0))negative_anno_file.flush()face_resize.save(os.path.join(negative_image_dir, "{0}.jpg".format(negative_count)))negative_count += 1# # 生成负样本# _boxes = np.array(boxes)# for i in range(2): # 数量一般和前面保持一样# side_len = np.random.randint(face_size, min(img_w, img_h) / 2)# x_ = np.random.randint(0, img_w - side_len)# y_ = np.random.randint(0, img_h - side_len)# crop_box = np.array([x_, y_, x_ + side_len, y_ + side_len])## if np.max(utils.iou(crop_box, _boxes)) < 0.29: # 在加IOU进行判断:保留小于0.3的那一部分;原为0.3# face_crop = img.crop(crop_box) # 抠图# face_resize = face_crop.resize((face_size, face_size),# Image.ANTIALIAS) # ANTIALIAS:平滑,抗锯齿## negative_anno_file.write(# "negative/{0}.jpg {1} 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n".format(negative_count, 0))# negative_anno_file.flush()# face_resize.save(os.path.join(negative_image_dir, "{0}.jpg".format(negative_count)))# negative_count += 1except Exception as e:print(e)traceback.print_exc()except Exception as e:print(e)# 关闭写入文件finally:positive_anno_file.close() # 关闭正样本txt件negative_anno_file.close()part_anno_file.close()if __name__ == '__main__':data = GenerateData()data.run(size=12)data.run(size=24)data.run(size=48)
数据代码
- 采用了数据增强,
- 在训练过程中发现,图像会把一些手或者菜当成人头,着实可怕,这是因为celeba数据中人物手把头挡住的问题,且黄皮肤居多,部分人物光暗。因此采用颜色变换增强,对比度,颜色。
- 在训练过程中发现,侧脸不容易识别,,因为对不少的数据集采用图像镜像实现侧脸翻转,效果不错。解决问题
# 创建数据集
from torch.utils.data import Dataset
import os
import numpy as np
import torch
from PIL import Image
from torchvision import transforms
tf1 = transforms.Compose([transforms.ColorJitter(brightness=0.5),transforms.RandomHorizontalFlip(p=0.5)]
)
tf2 = transforms.Compose([transforms.ColorJitter(contrast=0.5),transforms.RandomHorizontalFlip(p=0.5)]
)
tf3 = transforms.Compose([transforms.ColorJitter(saturation=0.5),transforms.RandomHorizontalFlip(p=0.5)]
)
tf4 = transforms.Compose([transforms.RandomHorizontalFlip(p=0.5)
])tf = transforms.RandomChoice([tf1,tf2,tf3,tf4]
)
# 数据集
class FaceDataset(Dataset):def __init__(self,path_1=r"D:\DataSet",path_2="D:\DataSet\wider",size=12,tf=tf):super(FaceDataset, self).__init__()self.dataset = []self.size = sizefor path in [path_1,path_2]:self.base_path_1 = pathself.path = os.path.join(self.base_path_1,str(self.size))for txt in ["positive.txt","negative.txt","part.txt"]:with open(os.path.join(self.path,txt),"r") as f:data = f.readlines()for line in data:line = line.strip().split()img_path = os.path.join(self.path,line[0])benkdata = " ".join(line[1:])self.dataset.append([img_path,benkdata])self.tf = tfdef __len__(self):return len(self.dataset) # 数据集长度def __getitem__(self, index): # 获取数据img_path,strs = self.dataset[index]strs = strs.strip().split(" ") # 取一条数据,去掉前后字符串,再按空格分割#标签:置信度+偏移量cond = torch.Tensor([int(strs[0])]) # []莫丢,否则指定的是shapeoffset = torch.Tensor([float(strs[1]),float(strs[2]),float(strs[3]),float(strs[4])])#样本:img_data# img_path = os.path.join(self.path,strs[0]) # 图片绝对路径img = Image.open(img_path)img = self.tf(img)# img.show()img = np.array(img) / 255. - 0.5img_data = torch.tensor(img,dtype=torch.float32) # 打开-->array-->归一化去均值化-->转成tensorimg_data = img_data.permute(2,0,1) # CWH# print(img_data.shape) # WHC# a = img_data.permute(2,0,1) #轴变换# print(a.shape) #[3, 48, 48]:CWHreturn img_data,cond,offset# 测试
if __name__ == '__main__':dataset = FaceDataset(size=12)print(dataset[0])print(len(dataset))
###网络训练
训练代码
增加了余弦退火法的训练方式,采用smoothL1回归坐标点,BCELoss 分类损失
p 网络训练
# 创建训练器----以训练三个网络
import os
from torch.optim.lr_scheduler import CosineAnnealingWarmRestarts
os.environ['KMP_DUPLICATE_LIB_OK'] = 'TRUE'
from torch.utils.data import DataLoader
import torch
from torch import nn
import torch.optim as optim
from sampling import FaceDataset # 导入数据集
from models import models
from torch.utils.tensorboard import SummaryWriter
# 创建训练器class Trainer:def __init__(self, net, save_path, dataset_size, isCuda=True,SummaryWriter_path=r"run"): # 网络,参数保存路径,训练数据路径,cuda加速为Trueself.net = netself.save_path = save_pathself.dataset_path = dataset_sizeself.isCuda = isCuda# print(self.net.name)# self.net.namesummaryWriter_path = os.path.join(SummaryWriter_path,self.net.name)if not os.path.exists(summaryWriter_path):os.makedirs(summaryWriter_path)length = len(os.listdir(summaryWriter_path))path_name = os.path.join(summaryWriter_path, "exp" + str(length))os.makedirs(path_name)self.summaryWriter = SummaryWriter(path_name)if self.isCuda: # 默认后面有个elseself.net.cuda() # 给网络加速# 创建损失函数# 置信度损失self.cls_loss_fn = nn.BCELoss() # ★二分类交叉熵损失函数,是多分类交叉熵(CrossEntropyLoss)的一个特例;用BCELoss前面必须用sigmoid激活,用CrossEntropyLoss前面必须用softmax函数# 偏移量损失self.offset_loss_fn = nn.SmoothL1Loss()# 创建优化器self.optimizer = optim.SGD(self.net.parameters(),lr=0.0001,momentum=0.9)# 恢复网络训练---加载模型参数,继续训练if os.path.exists(self.save_path): # 如果文件存在,接着继续训练net.load_state_dict(torch.load(self.save_path),strict=False)# 训练方法def train(self,epochs=1000):faceDataset = FaceDataset(size=self.dataset_path) # 数据集dataloader = DataLoader(faceDataset, batch_size=256, shuffle=True, num_workers=4,drop_last=True) # 数据加载器#num_workers=4:有4个线程在加载数据(加载数据需要时间,以防空置);drop_last:为True时表示,防止批次不足报错。scheduler = CosineAnnealingWarmRestarts(self.optimizer, T_0=5, T_mult=1)self.best_loss = 1for epoch in range(epochs):for i, (img_data_, category_, offset_) in enumerate(dataloader): # 样本,置信度,偏移量if self.isCuda: # cuda把数据读到显存里去了(先经过内存);没有cuda在内存,有cuda在显存img_data_ = img_data_.cuda() # [512, 3, 12, 12]category_ = category_.cuda() # 512, 1]offset_ = offset_.cuda() # [512, 4]# 网络输出_output_category, _output_offset = self.net(img_data_) # 输出置信度,偏移量# print(_output_category.shape) # [512, 1, 1, 1]# print(_output_offset.shape) # [512, 4, 1, 1]output_category = _output_category.reshape(-1, 1) # [512,1]output_offset = _output_offset.reshape(-1, 4) # [512,4]# output_landmark = _output_landmark.view(-1, 10)# 计算分类的损失----置信度category_mask = torch.lt(category_, 2) # 对置信度小于2的正样本(1)和负样本(0)进行掩码; ★部分样本(2)不参与损失计算;符合条件的返回1,不符合条件的返回0category = torch.masked_select(category_, category_mask) # 对“标签”中置信度小于2的选择掩码,返回符合条件的结果output_category = torch.masked_select(output_category, category_mask) # 预测的“标签”进掩码,返回符合条件的结果cls_loss = self.cls_loss_fn(output_category, category) # 对置信度做损失# 计算bound回归的损失----偏移量offset_mask = torch.gt(category_, 0) # 对置信度大于0的标签,进行掩码;★负样本不参与计算,负样本没偏移量;[512,1]offset_index = torch.nonzero(offset_mask)[:, 0] # 选出非负样本的索引;[244]offset = offset_[offset_index] # 标签里饿偏移量;[244,4]output_offset = output_offset[offset_index] # 输出的偏移量;[244,4]offset_loss = self.offset_loss_fn(output_offset, offset) # 偏移量损失#总损失loss = cls_loss + offset_loss# 反向传播,优化网络self.optimizer.zero_grad() # 清空之前的梯度loss.backward() # 计算梯度self.optimizer.step() # 优化网络#输出损失:loss-->gpu-->cup(变量)-->tensor-->arrayprint("epoch=",epoch ,"loss:", loss.cpu().data.numpy(), " cls_loss:", cls_loss.cpu().data.numpy(), " offset_loss",offset_loss.cpu().data.numpy())self.summaryWriter.add_scalars("loss", {"loss": loss.cpu().data.numpy(),"cls_loss": cls_loss.cpu().data.numpy(),"offser_loss": offset_loss.cpu().data.numpy()},epoch)# self.summaryWriter.add_histogram("pre_conv_layer1",self.net.pre_layer[0].weight,epoch)# self.summaryWriter.add_histogram("pre_conv_layer2",self.net.pre_layer[3].weight,epoch)# self.summaryWriter.add_histogram("pre_conv_layer3",self.net.pre_layer[5].weight,epoch)# 保存if i%5==0:if i%500==0:torch.save(self.net.state_dict(), self.save_path) # state_dict保存网络参数,save_path参数保存路径print("save success") # 每轮次保存一次;最好做一判断:损失下降时保存一次if loss.cpu().data.numpy()<self.best_loss:self.best_loss = loss.cpu().data.numpy()torch.save(self.net.state_dict(), self.save_path) # state_dict保存网络参数,save_path参数保存路径print("save success")# 每轮次保存一次;最好做一判断:损失下降时保存一次scheduler.step()if __name__ == '__main__':net = models.PNet()trainer = Trainer(net, 'pnet.pt', dataset_size=12) # 网络,保存参数,训练数据;创建训器trainer.train() # 调用训练器中的train方法# net = models.RNet()# trainer = Trainer(net, 'rnet.pt', r"D:\DataSet\24") # 网络,保存参数,训练数据;创建训器# trainer.train()# net = models.ONet()# trainer = Trainer(net, 'onet.pt', r"D:\DataSet\48") # 网络,保存参数,训练数据;创建训器# trainer.train()
R网络训练
# 创建训练器----以训练三个网络import os
os.environ['KMP_DUPLICATE_LIB_OK'] = 'TRUE'
from torch.utils.data import DataLoader
import torch
from torch import nn
import torch.optim as optim
from sampling import FaceDataset # 导入数据集
from models import models
from torch.utils.tensorboard import SummaryWriter# 创建训练器
class Trainer:def __init__(self, net, save_path, dataset_path, isCuda=True,SummaryWriter_path=r"run"): # 网络,参数保存路径,训练数据路径,cuda加速为Trueself.net = netself.save_path = save_pathself.dataset_path = dataset_pathself.isCuda = isCuda# print(self.net.name)# self.net.namesummaryWriter_path = os.path.join(SummaryWriter_path, self.net.name)if not os.path.exists(summaryWriter_path):os.makedirs(summaryWriter_path)length = len(os.listdir(summaryWriter_path))path_name = os.path.join(summaryWriter_path, "exp" + str(length))os.makedirs(path_name)self.summaryWriter = SummaryWriter(path_name)if self.isCuda: # 默认后面有个elseself.net.cuda() # 给网络加速# 创建损失函数# 置信度损失self.cls_loss_fn = nn.BCELoss() # ★二分类交叉熵损失函数,是多分类交叉熵(CrossEntropyLoss)的一个特例;用BCELoss前面必须用sigmoid激活,用CrossEntropyLoss前面必须用softmax函数# 偏移量损失self.offset_loss_fn = nn.SmoothL1Loss()# 创建优化器self.optimizer = optim.SGD(self.net.parameters(),lr=0.0001,momentum=0.8)# 恢复网络训练---加载模型参数,继续训练if os.path.exists(self.save_path): # 如果文件存在,接着继续训练net.load_state_dict(torch.load(self.save_path),strict=False)# 训练方法def train(self):faceDataset = FaceDataset(size=self.dataset_path) # 数据集dataloader = DataLoader(faceDataset, batch_size=128, shuffle=True, num_workers=2, drop_last=True) # 数据加载器# num_workers=4:有4个线程在加载数据(加载数据需要时间,以防空置);drop_last:为True时表示,防止批次不足报错。self.best_loss = Nonewhile True:for i, (img_data_, category_, offset_) in enumerate(dataloader): # 样本,置信度,偏移量if self.isCuda: # cuda把数据读到显存里去了(先经过内存);没有cuda在内存,有cuda在显存img_data_ = img_data_.cuda() # [512, 3, 12, 12]category_ = category_.cuda() # 512, 1]offset_ = offset_.cuda() # [512, 4]# 网络输出_output_category, _output_offset = self.net(img_data_) # 输出置信度,偏移量# print(_output_category.shape) # [512, 1, 1, 1]# print(_output_offset.shape) # [512, 4, 1, 1]output_category = _output_category.view(-1, 1) # [512,1]output_offset = _output_offset.view(-1, 4) # [512,4]# output_landmark = _output_landmark.view(-1, 10)# 计算分类的损失----置信度category_mask = torch.lt(category_, 2) # 对置信度小于2的正样本(1)和负样本(0)进行掩码; ★部分样本(2)不参与损失计算;符合条件的返回1,不符合条件的返回0category = torch.masked_select(category_, category_mask) # 对“标签”中置信度小于2的选择掩码,返回符合条件的结果output_category = torch.masked_select(output_category, category_mask) # 预测的“标签”进掩码,返回符合条件的结果cls_loss = self.cls_loss_fn(output_category, category) # 对置信度做损失# 计算bound回归的损失----偏移量offset_mask = torch.gt(category_, 0) # 对置信度大于0的标签,进行掩码;★负样本不参与计算,负样本没偏移量;[512,1]offset_index = torch.nonzero(offset_mask)[:, 0] # 选出非负样本的索引;[244]offset = offset_[offset_index] # 标签里饿偏移量;[244,4]output_offset = output_offset[offset_index] # 输出的偏移量;[244,4]offset_loss = self.offset_loss_fn(output_offset, offset) # 偏移量损失# 总损失loss = 0.5*cls_loss + offset_lossif i == 0:self.best_loss = loss.cpu().data.numpy()# 反向传播,优化网络self.optimizer.zero_grad() # 清空之前的梯度loss.backward() # 计算梯度self.optimizer.step() # 优化网络# 输出损失:loss-->gpu-->cup(变量)-->tensor-->arrayprint("i=", i, "loss:", loss.cpu().data.numpy(), " cls_loss:", cls_loss.cpu().data.numpy()," offset_loss",offset_loss.cpu().data.numpy())self.summaryWriter.add_scalars("loss", {"loss": loss.cpu().data.numpy(),"cls_loss": cls_loss.cpu().data.numpy(),"offser_loss": offset_loss.cpu().data.numpy()},i)# self.summaryWriter.add_histogram("pre_conv_layer1", self.net.pre_layer[0].weight,i)# self.summaryWriter.add_histogram("pre_conv_layer2", self.net.pre_layer[3].weight,i)# self.summaryWriter.add_histogram("pre_conv_layer3", self.net.pre_layer[6].weight,i)# 保存if (i + 1) % 100 == 0 or self.best_loss>loss.cpu().data.numpy():self.best_loss = loss.cpu().data.numpy()torch.save(self.net.state_dict(), self.save_path) # state_dict保存网络参数,save_path参数保存路径print("save success") # 每轮次保存一次;最好做一判断:损失下降时保存一次if __name__ == '__main__':net = models.RNet()trainer = Trainer(net, 'rnet.pt', 24) # 网络,保存参数,训练数据;创建训器trainer.train()
O网络训练
# 创建训练器----以训练三个网络import os
os.environ['KMP_DUPLICATE_LIB_OK'] = 'TRUE'
from torch.utils.data import DataLoader
import torch
from torch import nn
import torch.optim as optim
from sampling import FaceDataset # 导入数据集
from models import models
from torch.utils.tensorboard import SummaryWriter# 创建训练器
class Trainer:def __init__(self, net, save_path, dataset_path, isCuda=True,SummaryWriter_path=r"run"): # 网络,参数保存路径,训练数据路径,cuda加速为Trueself.net = netself.save_path = save_pathself.dataset_path = dataset_pathself.isCuda = isCuda# print(self.net.name)# self.net.namesummaryWriter_path = os.path.join(SummaryWriter_path, self.net.name)if not os.path.exists(summaryWriter_path):os.makedirs(summaryWriter_path)length = len(os.listdir(summaryWriter_path))path_name = os.path.join(summaryWriter_path, "exp" + str(length))os.makedirs(path_name)self.summaryWriter = SummaryWriter(path_name)if self.isCuda: # 默认后面有个elseself.net.cuda() # 给网络加速# 创建损失函数# 置信度损失self.cls_loss_fn = nn.BCELoss() # ★二分类交叉熵损失函数,是多分类交叉熵(CrossEntropyLoss)的一个特例;用BCELoss前面必须用sigmoid激活,用CrossEntropyLoss前面必须用softmax函数# 偏移量损失self.offset_loss_fn = nn.SmoothL1Loss()# 创建优化器self.optimizer = optim.SGD(self.net.parameters(),lr=0.0001,momentum=0.8)# 恢复网络训练---加载模型参数,继续训练if os.path.exists(self.save_path): # 如果文件存在,接着继续训练net.load_state_dict(torch.load(self.save_path),strict=False)# 训练方法def train(self):faceDataset = FaceDataset(size=self.dataset_path) # 数据集dataloader = DataLoader(faceDataset, batch_size=128, shuffle=True, num_workers=2, drop_last=True) # 数据加载器# num_workers=4:有4个线程在加载数据(加载数据需要时间,以防空置);drop_last:为True时表示,防止批次不足报错。self.best_loss = Nonewhile True:for i, (img_data_, category_, offset_) in enumerate(dataloader): # 样本,置信度,偏移量if self.isCuda: # cuda把数据读到显存里去了(先经过内存);没有cuda在内存,有cuda在显存img_data_ = img_data_.cuda() # [512, 3, 12, 12]category_ = category_.cuda() # 512, 1]offset_ = offset_.cuda() # [512, 4]# 网络输出_output_category, _output_offset = self.net(img_data_) # 输出置信度,偏移量# print(_output_category.shape) # [512, 1, 1, 1]# print(_output_offset.shape) # [512, 4, 1, 1]output_category = _output_category.view(-1, 1) # [512,1]output_offset = _output_offset.view(-1, 4) # [512,4]# output_landmark = _output_landmark.view(-1, 10)# 计算分类的损失----置信度category_mask = torch.lt(category_, 2) # 对置信度小于2的正样本(1)和负样本(0)进行掩码; ★部分样本(2)不参与损失计算;符合条件的返回1,不符合条件的返回0category = torch.masked_select(category_, category_mask) # 对“标签”中置信度小于2的选择掩码,返回符合条件的结果output_category = torch.masked_select(output_category, category_mask) # 预测的“标签”进掩码,返回符合条件的结果cls_loss = self.cls_loss_fn(output_category, category) # 对置信度做损失# 计算bound回归的损失----偏移量offset_mask = torch.gt(category_, 0) # 对置信度大于0的标签,进行掩码;★负样本不参与计算,负样本没偏移量;[512,1]offset_index = torch.nonzero(offset_mask)[:, 0] # 选出非负样本的索引;[244]offset = offset_[offset_index] # 标签里饿偏移量;[244,4]output_offset = output_offset[offset_index] # 输出的偏移量;[244,4]offset_loss = self.offset_loss_fn(output_offset, offset) # 偏移量损失# 总损失loss = 0.5*cls_loss + offset_lossif i == 0:self.best_loss = loss.cpu().data.numpy()# 反向传播,优化网络self.optimizer.zero_grad() # 清空之前的梯度loss.backward() # 计算梯度self.optimizer.step() # 优化网络# 输出损失:loss-->gpu-->cup(变量)-->tensor-->arrayprint("i=", i, "loss:", loss.cpu().data.numpy(), " cls_loss:", cls_loss.cpu().data.numpy()," offset_loss",offset_loss.cpu().data.numpy())self.summaryWriter.add_scalars("loss", {"loss": loss.cpu().data.numpy(),"cls_loss": cls_loss.cpu().data.numpy(),"offser_loss": offset_loss.cpu().data.numpy()},i)# self.summaryWriter.add_histogram("pre_conv_layer1", self.net.pre_layer[0].weight,i)# self.summaryWriter.add_histogram("pre_conv_layer2", self.net.pre_layer[3].weight,i)# self.summaryWriter.add_histogram("pre_conv_layer3", self.net.pre_layer[6].weight,i)# 保存if (i + 1) % 100 == 0 or self.best_loss>loss.cpu().data.numpy():self.best_loss = loss.cpu().data.numpy()torch.save(self.net.state_dict(), self.save_path) # state_dict保存网络参数,save_path参数保存路径print("save success") # 每轮次保存一次;最好做一判断:损失下降时保存一次if __name__ == '__main__':net = models.RNet()trainer = Trainer(net, 'rnet.pt', 24) # 网络,保存参数,训练数据;创建训器trainer.train()
侦测代码
- p网络采用全卷积手法的原因,可以输入不同尺度的图片,利用输出特征图进行反算
import os
import timeimport cv2from tools import utils
import numpy as np
import torch
from PIL import Image, ImageDraw
from torchvision import transformsfrom models import modelsclass Detector():def __init__(self,pnet_param="pnet.pt",rnet_param="rnet.pt",onet_param="onet.pt",isCuda=True,# 网络调参p_cls=0.6, # 原为0.6p_nms=0.5, # 原为0.5r_cls=0.6, # 原为0.6r_nms=0.5, # 原为0.5# R网络:o_cls=0.99, # 原为0.97o_nms=0.6, # 原为0.7):self.isCuda = isCudaself.pnet = models.PNet() # 创建实例变量,实例化P网络self.rnet = models.RNet()self.onet = models.ONet()if self.isCuda:self.pnet.cuda()self.rnet.cuda()self.onet.cuda()self.pnet.load_state_dict(torch.load(pnet_param)) # 把训练好的权重加载到P网络中self.rnet.load_state_dict(torch.load(rnet_param))self.onet.load_state_dict(torch.load(onet_param))self.pnet.eval() # 训练网络里有BN(批归一化时),要调用eval方法,使用是不用BN,dropout方法self.rnet.eval()self.onet.eval()self.p_cls = p_cls # 原为0.6self.p_nms = p_nms # 原为0.5self.r_cls = r_cls # 原为0.6self.r_nms = r_nms # 原为0.5# R网络:self.o_cls = o_cls # 原为0.97self.o_nms = o_nms # 原为0.7self.__image_transform = transforms.Compose([transforms.ToTensor()])def detect(self, image):# P网络检测-----1ststart_time = time.time()pnet_boxes = self.__pnet_detect(image) # 调用__pnet_detect函数(后面定义)if pnet_boxes.shape[0]==0:return np.array([])end_time = time.time()t_pnet = end_time - start_time # P网络所占用的时间差# print(pnet_boxes.shape)# return pnet_boxes # p网络检测出的框start_time = time.time()rnet_boxes = self.__rnet_detect(image,pnet_boxes) # 调用__pnet_detect函数(后面定义)if rnet_boxes.shape[0] == 0:return np.array([])end_time = time.time()t_rnet = end_time - start_time # r网络所占用的时间差# return rnet_boxes # r网络检测出的框onet_boxes = self.__onet_detect(image,rnet_boxes) # 调用__pnet_detect函数(后面定义)if rnet_boxes.shape[0] == 0:return np.array([])end_time = time.time()t_onet = end_time - start_time # P网络所占用的时间差# return onet_boxes # p网络检测出的框# 三网络检测的总时间t_sum = t_pnet + t_rnet + t_onetprint("total:{0} pnet:{1} rnet:{2} onet:{3}".format(t_sum, t_pnet, t_rnet, t_onet))return onet_boxesdef __pnet_detect(self, image): # ★p网络全部是卷积,与输入图片大小无关,可输出任意形状图片boxes = [] # 创建空列表,接收符合条件的建议框img = imagew, h = img.sizemin_side_len = min(w, h) # 获取图片的最小边长scale = 1 # 初始缩放比例(为1时不缩放):得到不同分辨率的图片while min_side_len > 12: # 直到缩放到小于等于12时停止img_data = self.__image_transform(img) # 将图片数组转成张量if self.isCuda:img_data = img_data.cuda() # 将图片tensor传到cuda里加速img_data.unsqueeze_(0) # 在“批次”上升维(测试时传的不止一张图片)# print("img_data:",img_data.shape) # [1, 3, 416, 500]:C=3,W=416,H=500_cls, _offest = self.pnet(img_data) # ★★返回多个置信度和偏移量# print("_cls",_cls.shape) # [1, 1, 203, 245]:NCWH:分组卷积的特征图的通道和尺寸★# print("_offest", _offest.shape) # [1, 4, 203, 245]:NCWHcls = _cls[0][0].cpu().data # [203, 245]:分组卷积特征图的尺寸:W,Hoffest = _offest[0].cpu().data # [4, 203, 245] # 分组卷积特征图的通道、尺寸:C,W,Hidxs = torch.nonzero(torch.gt(cls, self.p_cls)) # ★置信度大于0.6的框索引;把P网络输出,看有没没框到的人脸,若没框到人脸,说明网络没训练好;或者置信度给高了、调低# print(idxs)for idx in idxs: # 根据索引,依次添加符合条件的框;cls[idx[0], idx[1]]在置信度中取值:idx[0]行索引,idx[1]列索引boxes.append(self.__box(idx, offest, cls[idx[0], idx[1]], scale)) # ★调用框反算函数_box(把特征图上的框,反算到原图上去),把大于0.6的框留下来;scale *= 0.7 # 缩放图片:循环控制条件_w = int(w * scale) # 新的宽度_h = int(h * scale)img = img.resize((_w, _h)) # 根据缩放后的宽和高,对图片进行缩放min_side_len = min(_w, _h) # 重新获取最小宽高return utils.nms(np.array(boxes), self.p_nms) # 返回框框,原阈值给p_nms=0.5(iou为0.5),尽可能保留IOU小于0.5的一些框下来,若网络训练的好,值可以给低些# 特征反算:将回归量还原到原图上去,根据特征图反算的到原图建议框def __box(self, start_index, offset, cls, scale, stride=2, side_len=12): # p网络池化步长为2_x1 = (start_index[1].float() * stride) / scale # 索引乘以步长,除以缩放比例;★特征反算时“行索引,索引互换”,原为[0]_y1 = (start_index[0].float() * stride) / scale_x2 = (start_index[1].float() * stride + side_len) / scale_y2 = (start_index[0].float() * stride + side_len) / scaleow = _x2 - _x1 # 人脸所在区域建议框的宽和高oh = _y2 - _y1_offset = offset[:, start_index[0], start_index[1]] # 根据idxs行索引与列索引,找到对应偏移量△δ:[x1,y1,x2,y2]x1 = _x1 + ow * _offset[0] # 根据偏移量算实际框的位置,x1=x1_+w*△δ;生样时为:△δ=x1-x1_/wy1 = _y1 + oh * _offset[1]x2 = _x2 + ow * _offset[2]y2 = _y2 + oh * _offset[3]return [x1, y1, x2, y2, cls] # 正式框:返回4个坐标点和1个偏移量def __rnet_detect(self, image, pnet_boxes):_img_dataset = [] # 创建空列表,存放抠图_pnet_boxes = utils.convert_to_square(pnet_boxes) # ★给p网络输出的框,找出中心点,沿着最大边长的两边扩充成“正方形”,再抠图for _box in _pnet_boxes: # ★遍历每个框,每个框返回框4个坐标点,抠图,放缩,数据类型转换,添加列表_x1 = int(_box[0])_y1 = int(_box[1])_x2 = int(_box[2])_y2 = int(_box[3])img = image.crop((_x1, _y1, _x2, _y2)) # 根据4个坐标点抠图img = img.resize((24, 24)) # 放缩在固尺寸img_data = self.__image_transform(img) # 将图片数组转成张量_img_dataset.append(img_data)img_dataset = torch.stack(_img_dataset) # stack堆叠(默认在0轴),此处相当数据类型转换,见例子2★if self.isCuda:img_dataset = img_dataset.cuda() # 给图片数据采用cuda加速_cls, _offset = self.rnet(img_dataset) # ★★将24*24的图片传入网络再进行一次筛选cls = _cls.cpu().data.numpy() # 将gpu上的数据放到cpu上去,在转成numpy数组offset = _offset.cpu().data.numpy()# print("r_cls:",cls.shape) # (11, 1):P网络生成了11个框# print("r_offset:", offset.shape) # (11, 4)boxes = [] # R 网络要留下来的框,存到boxes里idxs, _ = np.where(cls > self.r_cls) # 原置信度0.6是偏低的,时候很多框并没有用(可打印出来观察),可以适当调高些;idxs置信度框大于0.6的索引;★返回idxs:0轴上索引[0,1],_:1轴上索引[0,0],共同决定元素位置,见例子3for idx in idxs: # 根据索引,遍历符合条件的框;1轴上的索引,恰为符合条件的置信度索引(0轴上索引此处用不到)_box = _pnet_boxes[idx]_x1 = int(_box[0])_y1 = int(_box[1])_x2 = int(_box[2])_y2 = int(_box[3])ow = _x2 - _x1 # 基准框的宽oh = _y2 - _y1x1 = _x1 + ow * offset[idx][0] # 实际框的坐标点y1 = _y1 + oh * offset[idx][1]x2 = _x2 + ow * offset[idx][2]y2 = _y2 + oh * offset[idx][3]boxes.append([x1, y1, x2, y2, cls[idx][0]]) # 返回4个坐标点和置信度return utils.nms(np.array(boxes), self.r_nms) # 原r_nms为0.5(0.5要往小调),上面的0.6要往大调;小于0.5的框被保留下来# 创建O网络检测函数def __onet_detect(self, image, rnet_boxes):_img_dataset = [] # 创建列表,存放抠图r_rnet_boxes = utils.convert_to_square(rnet_boxes) # 给r网络输出的框,找出中心点,沿着最大边长的两边扩充成“正方形”for _box in _rnet_boxes: # 遍历R网络筛选出来的框,计算坐标,抠图,缩放,数据类型转换,添加列表,堆叠_x1 = int(_box[0])_y1 = int(_box[1])_x2 = int(_box[2])_y2 = int(_box[3])img = image.crop((_x1, _y1, _x2, _y2)) # 根据坐标点“抠图”img = img.resize((48, 48))img_data = self.__image_transform(img) # 将抠出的图转成张量_img_dataset.append(img_data)img_dataset = torch.stack(_img_dataset) # 堆叠,此处相当数据格式转换,见例子2if self.isCuda:img_dataset = img_dataset.cuda()_cls, _offset = self.onet(img_dataset)cls = _cls.cpu().data.numpy() # (1, 1)offset = _offset.cpu().data.numpy() # (1, 4)boxes = [] # 存放o网络的计算结果idxs, _ = np.where(cls > self.o_cls) # 原o_cls为0.97是偏低的,最后要达到标准置信度要达到0.99999,这里可以写成0.99998,这样的话出来就全是人脸;留下置信度大于0.97的框;★返回idxs:0轴上索引[0],_:1轴上索引[0],共同决定元素位置,见例子3for idx in idxs: # 根据索引,遍历符合条件的框;1轴上的索引,恰为符合条件的置信度索引(0轴上索引此处用不到)_box = _rnet_boxes[idx] # 以R网络做为基准框_x1 = int(_box[0])_y1 = int(_box[1])_x2 = int(_box[2])_y2 = int(_box[3])ow = _x2 - _x1 # 框的基准宽,框是“方”的,ow=ohoh = _y2 - _y1x1 = _x1 + ow * offset[idx][0] # O网络最终生成的框的坐标;生样,偏移量△δ=x1-_x1/w*side_leny1 = _y1 + oh * offset[idx][1]x2 = _x2 + ow * offset[idx][2]y2 = _y2 + oh * offset[idx][3]boxes.append([x1, y1, x2, y2, cls[idx][0]]) # 返回4个坐标点和1个置信度return utils.nms(np.array(boxes), self.o_nms, isMin=True) # 用最小面积的IOU;原o_nms(IOU)为小于0.7的框被保留下来def detect(imgs_path="./test_images", save_path="./result_images",test_video=False):if not os.path.exists(save_path):os.makedirs(save_path)if test_video:# 多张图片显示detector = Detector()cap = cv2.VideoCapture('./test_video/蔡徐坤.mp4')num = 0while cap.isOpened():ret,frame = cap.read()if cv2.waitKey(25) & 0xFF == ord('q'):breakimage = Image.fromarray(cv2.cvtColor(frame,cv2.COLOR_BGR2RGB))print("-------------------")boxes = detector.detect(image)print("size:", image.size)for box in boxes:x1 = int(box[0])y1 = int(box[1])x2 = int(box[2])y2 = int(box[3])print(x1, y1, x2, y2)print("conf:", box[4])head = image.crop(image,[x1,y1,x2,y2])head.save(f"./蔡徐坤照片/{num}.jpg")num+=1cv2.rectangle(frame, (x1, y1), (x2, y2), (0,0,255), 2)cv2.imshow("res",frame)cap.release()cv2.destroyAllWindows()else:for img in os.listdir(imgs_path):detector = Detector()with Image.open(os.path.join(imgs_path, img)) as image:print("-------------------")boxes = detector.detect(image)print("size:", image.size)img_draw = ImageDraw.Draw(image)for box in boxes:x1 = int(box[0])y1 = int(box[1])x2 = int(box[2])y2 = int(box[3])print(x1, y1, x2, y2,"conf:", box[4])# print("conf:", box[4])img_draw.rectangle((x1, y1, x2, y2), outline="red")image.show()image.save(os.path.join(save_path, img))if __name__ == '__main__':detect()
总结
人脸侦测中遇到了很多问题:
-
MTCNN的第一阶段,图像金字塔会反反复复地很多次调用一个很浅层的P-NET网络,导致数据会反反复复地从内存COPY到显存,又从显存COPY到内存,而这个复制操作消耗很大,甚至比计算本身耗时。
-
P网络运行速度对整个模型的影响较大;R、O网络抠图文件操作耗费时间;for循环串行耗费时间;硬件问题加剧模型耗时
原因:图像金字塔需要耗费很多时间,反算没使用Tensor和矩阵进行计算;抠图未使用切片完成;一般越高级的语言运行越慢。
解决方法:根据实际需要调整缩放比;使用tensor和矩阵运算优化代码。 -
调用模型时,记得eval()
-
解决问题的最好方式,是增加准确的数据
-
这里别用ReLU,效果不好,负半轴的丢失会造成一定的信息损失。
初次总结:如有不足,敬请求指教。
相关代码链接已上传csdn 0积分 。
下载链接
本人方向目标识别,有兴趣的小伙伴可以一起交流
WX:
这篇关于深度学习——基于MTCNN算法实现人脸侦测的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!