CTPN源码解析3.2-loss()函数解析

本文主要是介绍CTPN源码解析3.2-loss()函数解析，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

文本检测算法一：CTPN

CTPN源码解析1-数据预处理split_label.py

CTPN源码解析2-代码整体结构和框架

CTPN源码解析3.1-model()函数解析

CTPN源码解析3.2-loss()函数解析

CTPN源码解析4-损失函数

CTPN源码解析5-文本线构造算法构造文本行

CTPN训练自己的数据集

由于解析的这个CTPN代码是被banjin-xjy和eragonruan大神重新封装过的，所以代码整体结构非常的清晰，简洁！不像上次解析FasterRCNN的代码那样跳来跳去，没跳几步脑子就被跳乱了[捂脸]，向大神致敬！PS：里面肯定会有理解和注释错误的，欢迎批评指正！

解析源码地址：https://github.com/eragonruan/text-detection-ctpn

知乎：从代码实现的角度理解CTPN：https://zhuanlan.zhihu.com/p/49588885

知乎：理解文本检测网络CTPN：https://zhuanlan.zhihu.com/p/77883736

知乎：场景文字检测—CTPN原理与实现：https://zhuanlan.zhihu.com/p/34757009

loss()函数流程：

loss()函数代码：

'''
计算损失函数
bbox_pred, 预测框
cls_pred,  预测框分类概率值
bbox,      标记框，gt
im_info    图像信息：宽、高、通道数
'''
def loss(bbox_pred, cls_pred, bbox, im_info):#产生预选框rpn_data = anchor_target_layer(cls_pred, bbox, im_info, "anchor_target_layer")# rpn_data=[rpn_labels, rpn_bbox_targets, rpn_bbox_inside_weights, rpn_bbox_outside_weights]# classification loss   分类损失函数# transpose: (1, H, W, A x d) -> (1, H, WxA, d)cls_pred_shape = tf.shape(cls_pred)  #cls_pred shape(?,?,?,20) ->shape(4,?)cls_pred_reshape = tf.reshape(cls_pred, [cls_pred_shape[0], cls_pred_shape[1], -1, 2])  # shape(?,?,?,2)rpn_cls_score = tf.reshape(cls_pred_reshape, [-1, 2])  # 分类得分 shape(?,2)rpn_label = tf.reshape(rpn_data[0], [-1])  # 标签# ignore_label(-1)fg_keep = tf.equal(rpn_label, 1)  # 取出rpn标签中为1的，前景rpn_keep = tf.where(tf.not_equal(rpn_label, -1))  # 取出rpn标签中不为-1的，正样本和负样本rpn_cls_score = tf.gather(rpn_cls_score, rpn_keep)  #  shape(?,1,2)，正样本和负样本的得分rpn_label = tf.gather(rpn_label, rpn_keep)  # shape(?,1)，正样本和负样本的标签，从'rpn_label'的'axis'维根据'rpn_keep'的参数值获取切片。# 根据正样本负样本的标签  及 正样本负样本的得分求交叉熵损失rpn_cross_entropy_n = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=rpn_label, logits=rpn_cls_score)# box loss  回归损失函数rpn_bbox_pred = bbox_pred               # 回归预测  shape(?,?,?,40)rpn_bbox_targets = rpn_data[1]          # rpn_bbox_targets 标记信息GTrpn_bbox_inside_weights = rpn_data[2]   # 正样本为1负样本为0rpn_bbox_outside_weights = rpn_data[3]  # 平衡分类回归损失参数rpn_bbox_pred = tf.gather(tf.reshape(rpn_bbox_pred, [-1, 4]), rpn_keep)   # shape (N, 4) 根据正负样本索引取出相应的预测边框rpn_bbox_targets = tf.gather(tf.reshape(rpn_bbox_targets, [-1, 4]), rpn_keep)   # shape (？,1, 4) 根据正负样本索引取出相应的值  gtrpn_bbox_inside_weights = tf.gather(tf.reshape(rpn_bbox_inside_weights, [-1, 4]), rpn_keep)  # shape (？,1, 4) 根据正负样本索引取出相应的值rpn_bbox_outside_weights = tf.gather(tf.reshape(rpn_bbox_outside_weights, [-1, 4]), rpn_keep)  # shape (？,1, 4) 根据正负样本索引取出相应的值rpn_loss_box_n = tf.reduce_sum(rpn_bbox_outside_weights * smooth_l1_dist(rpn_bbox_inside_weights * (rpn_bbox_pred - rpn_bbox_targets)), reduction_indices=[1])   # 只有正样本才计算smooth_l1损失rpn_loss_box = tf.reduce_sum(rpn_loss_box_n) / (tf.reduce_sum(tf.cast(fg_keep, tf.float32)) + 1)   # 求出回归损失的平均值rpn_cross_entropy = tf.reduce_mean(rpn_cross_entropy_n)   #计算分类的交叉熵损失值得平均值model_loss = rpn_cross_entropy + rpn_loss_box   # 模型损失=分类的交叉熵损失+回归的smooth_l1损失# 作者好像没有加入第三个损失-- x轴回归损失# tf.get_collection() 选取特定的变量regularization_losses = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)total_loss = tf.add_n(regularization_losses) + model_loss  # 在损失函数上加上正则项是防止过拟合的一个重要方法tf.summary.scalar('model_loss', model_loss)tf.summary.scalar('total_loss', total_loss)tf.summary.scalar('rpn_cross_entropy', rpn_cross_entropy)tf.summary.scalar('rpn_loss_box', rpn_loss_box)return total_loss, model_loss, rpn_cross_entropy, rpn_loss_box

loss()函数主要分两步：

第一步是调用anchor_target_layer()函数获取rpn相关数据

rpn_data=[rpn_labels, rpn_bbox_targets, rpn_bbox_inside_weights, rpn_bbox_outside_weights]

rpn_labels                 产生anchor的标签
rpn_bbox_targets           预测的矩形框
rpn_bbox_inside_weights    内部权重，前景为1，背景为0，不关心也是0
rpn_bbox_outside_weights   外部权重，前景是1，背景是0，用于平衡分类和回归损失的比重

第二步是根据得到的rpn数据结合图像信息和标注信息（GT）计算分类损失和回归损失。

第二步的代码上面的loss()函数代码里，就不多说了，下面主要说一下第一步：调用anchor_target_layer()函数获取rpn相关数据。

anchor_target_layer()函数流程：

anchor_target_layer()函数代码：

这个函数应该是CTPN里最复杂的了。

"""
Assign anchors to ground-truth targets. Produces anchor classification
labels and bounding-box regression targets.
将锚点分配给真实目标。 生成锚点分类标签和边界框回归目标。
Parameters
----------
rpn_cls_score: (1, H, W, Ax2) bg/fg scores of previous conv layer  是前景还是背景的分类概率
gt_boxes: (G, 5) vstack of [x1, y1, x2, y2, class]    真实标签
im_info: a list of [image_height, image_width, scale_ratios]  图像信息
_feat_stride: the downsampling ratio of feature map to the original input image  原始图像到特征图的下采样率
anchor_scales: the scales to the basic_anchor (basic anchor is [16, 16])  基本锚点的大小
----------
Returns  返回值
----------
rpn_labels : (HxWxA, 1), for each anchor, 0 denotes bg, 1 fg, -1 dontcare    0表示背景，1是前景，-1不关心
rpn_bbox_targets: (HxWxA, 4), distances of the anchors to the gt_boxes(may contains some transform)that are the regression objectives 锚点到作为回归目标的gt_boxes（可能包含一些变换）的距离
rpn_bbox_inside_weights: (HxWxA, 4) weights of each boxes, mainly accepts hyper param in cfg  每个box的权重，主要在cfg中接受超级参数
rpn_bbox_outside_weights: (HxWxA, 4) used to balance the fg/bg,    用于平衡前景背景beacuse the numbers of bgs and fgs mays significiantly different  因为ngs和fgs的数量可能有很大的不同
"""
def anchor_target_layer(rpn_cls_score, gt_boxes, im_info, _feat_stride=[16, ], anchor_scales=[16, ]):_anchors = generate_anchors(scales=np.array(anchor_scales))  # 生成基本的anchor,一共10个，shape(10,4)_num_anchors = _anchors.shape[0]  # 10个anchorif DEBUG:print('anchors:')print(_anchors)print('anchor shapes:')print(np.hstack((_anchors[:, 2::4] - _anchors[:, 0::4],_anchors[:, 3::4] - _anchors[:, 1::4],)))_counts = cfg.EPS_sums = np.zeros((1, 4))_squared_sums = np.zeros((1, 4))_fg_sum = 0_bg_sum = 0_count = 0# allow boxes to sit over the edge by a small amount  允许boxes超出图像边界的阈值_allowed_border = 0   #不允许超出图像边界# map of shape (..., H, W)# height, width = rpn_cls_score.shape[1:3]im_info = im_info[0]  #获取图像的高宽及通道数if DEBUG:print("im_info: ", im_info)# 在feature-map上定位anchor，并加上delta，得到在实际图像中anchor的真实坐标# Algorithm:# for each (H, W) location i#   generate 9 anchor boxes centered on cell i#   apply predicted bbox deltas at cell i to each of the 9 anchors# filter out-of-image anchors# measure GT overlap# assert语句的格式是【assert 表达式，返回数据】，当表达式为False时则触发AssertionError异常assert rpn_cls_score.shape[0] == 1, 'Only single item batches are supported'    # 一次只能传入一张图# map of shape (..., H, W)height, width = rpn_cls_score.shape[1:3]  # feature-map的高宽if DEBUG:print('AnchorTargetLayer: height', height, 'width', width)print('')print('im_size: ({}, {})'.format(im_info[0], im_info[1]))print('scale: {}'.format(im_info[2]))print('height, width: ({}, {})'.format(height, width))print('rpn: gt_boxes.shape', gt_boxes.shape)print('rpn: gt_boxes', gt_boxes)# 1. Generate proposals from bbox deltas and shifted anchorsshift_x = np.arange(0, width) * _feat_stride   #_feat_stride=[16]shift_y = np.arange(0, height) * _feat_strideshift_x, shift_y = np.meshgrid(shift_x, shift_y)  # in W H order# K is H x W  1938=38*51shifts = np.vstack((shift_x.ravel(), shift_y.ravel(),   # ravel()将多维数组转换为一维数组，如果没有必要，不会产生源数据的副本shift_x.ravel(), shift_y.ravel())).transpose()  # 生成feature-map和真实image上anchor之间的偏移量，shape(1938,4)# add A anchors (1, A, 4) to# cell K shifts (K, 1, 4) to get# shift anchors (K, A, 4)# reshape to (K*A, 4) shifted anchorsA = _num_anchors  # 10个anchorK = shifts.shape[0]  # 50*38，feature-map的宽乘高的大小all_anchors = (_anchors.reshape((1, A, 4)) +shifts.reshape((1, K, 4)).transpose((1, 0, 2)))  # 相当于复制宽高的维度，然后相加 shape(1938,10,4)all_anchors = all_anchors.reshape((K * A, 4))   # shape(19380,4)total_anchors = int(K * A)   # 1938*10=19380# only keep anchors inside the image# 仅保留那些还在图像内部的anchor，超出图像的都删掉inds_inside = np.where((all_anchors[:, 0] >= -_allowed_border) &(all_anchors[:, 1] >= -_allowed_border) &(all_anchors[:, 2] < im_info[1] + _allowed_border) &  # width(all_anchors[:, 3] < im_info[0] + _allowed_border)  # height)[0]   # 获得在图像内部的anchor索引if DEBUG:print('total_anchors', total_anchors)print('inds_inside', len(inds_inside))# keep only inside anchorsanchors = all_anchors[inds_inside, :]  # 根据在图像内部的anchor索引获取那些在图像内的anchorif DEBUG:print('anchors.shape', anchors.shape)# 至此，anchor准备好了# --------------------------------------------------------------# label: 1 is positive, 0 is negative, -1 is dont care    1是前景，0是背景，-1不关心# (A)labels = np.empty((len(inds_inside),), dtype=np.float32)   #根据在图像内部的anchor数量，创建标签列表labels.fill(-1)  # 初始化label，均为-1# overlaps between the anchors and the gt boxes# overlaps (ex, gt), shape is A x G# 计算anchor和gt-box的overlap，用来给anchor上标签overlaps = bbox_overlaps(np.ascontiguousarray(anchors, dtype=np.float),np.ascontiguousarray(gt_boxes, dtype=np.float))  # 假设anchors有x个，gt_boxes有y个，返回的是一个（x,y）的数组# 存放每一个anchor和每一个gtbox之间的overlapargmax_overlaps = overlaps.argmax(axis=1)  # (A)#找到和每一个gtbox，overlap最大的那个anchormax_overlaps = overlaps[np.arange(len(inds_inside)), argmax_overlaps]gt_argmax_overlaps = overlaps.argmax(axis=0)  # G#找到每个位置上10个anchor中与gtbox，overlap最大的那个gt_max_overlaps = overlaps[gt_argmax_overlaps,np.arange(overlaps.shape[1])]gt_argmax_overlaps = np.where(overlaps == gt_max_overlaps)[0]if not cfg.RPN_CLOBBER_POSITIVES:# assign bg labels first so that positive labels can clobber themlabels[max_overlaps < cfg.RPN_NEGATIVE_OVERLAP] = 0  # 先给背景上标签，小于0.3overlap的# fg label: for each gt, anchor with highest overlaplabels[gt_argmax_overlaps] = 1  # 每个位置上的10个anchor中overlap最大的认为是前景# fg label: above threshold IOUlabels[max_overlaps >= cfg.RPN_POSITIVE_OVERLAP] = 1  # overlap大于0.7的认为是前景if cfg.RPN_CLOBBER_POSITIVES:# assign bg labels last so that negative labels can clobber positiveslabels[max_overlaps < cfg.RPN_NEGATIVE_OVERLAP] = 0# subsample positive labels if we have too many# 对正样本进行采样，如果正样本的数量太多的话# 限制正样本的数量不超过128个num_fg = int(cfg.RPN_FG_FRACTION * cfg.RPN_BATCHSIZE)fg_inds = np.where(labels == 1)[0]if len(fg_inds) > num_fg:disable_inds = npr.choice(fg_inds, size=(len(fg_inds) - num_fg), replace=False)  # 随机去除掉一些正样本labels[disable_inds] = -1  # 变为-1# subsample negative labels if we have too many# 对负样本进行采样，如果负样本的数量太多的话# 正负样本总数是256，限制正样本数目最多128，# 如果正样本数量小于128，差的那些就用负样本补上，凑齐256个样本num_bg = cfg.RPN_BATCHSIZE - np.sum(labels == 1)bg_inds = np.where(labels == 0)[0]if len(bg_inds) > num_bg:disable_inds = npr.choice(bg_inds, size=(len(bg_inds) - num_bg), replace=False)labels[disable_inds] = -1# print "was %s inds, disabling %s, now %s inds" % (# len(bg_inds), len(disable_inds), np.sum(labels == 0))# 至此， 上好标签，开始计算rpn-box的真值# --------------------------------------------------------------bbox_targets = np.zeros((len(inds_inside), 4), dtype=np.float32)bbox_targets = _compute_targets(anchors, gt_boxes[argmax_overlaps, :])  # 根据anchor和gtbox计算得真值（anchor和gtbox之间的偏差）bbox_inside_weights = np.zeros((len(inds_inside), 4), dtype=np.float32)bbox_inside_weights[labels == 1, :] = np.array(cfg.RPN_BBOX_INSIDE_WEIGHTS)  # 内部权重，前景就给1，其他是0bbox_outside_weights = np.zeros((len(inds_inside), 4), dtype=np.float32)if cfg.RPN_POSITIVE_WEIGHT < 0:  # 暂时使用uniform 权重，也就是正样本是1，负样本是0# uniform weighting of examples (given non-uniform sampling)num_examples = np.sum(labels >= 0) + 1# positive_weights = np.ones((1, 4)) * 1.0 / num_examples# negative_weights = np.ones((1, 4)) * 1.0 / num_examplespositive_weights = np.ones((1, 4))negative_weights = np.zeros((1, 4))else:assert ((cfg.RPN_POSITIVE_WEIGHT > 0) &(cfg.RPN_POSITIVE_WEIGHT < 1))positive_weights = (cfg.RPN_POSITIVE_WEIGHT /(np.sum(labels == 1)) + 1)negative_weights = ((1.0 - cfg.RPN_POSITIVE_WEIGHT) /(np.sum(labels == 0)) + 1)bbox_outside_weights[labels == 1, :] = positive_weights  # 外部权重，前景是1，背景是0bbox_outside_weights[labels == 0, :] = negative_weightsif DEBUG:_sums += bbox_targets[labels == 1, :].sum(axis=0)_squared_sums += (bbox_targets[labels == 1, :] ** 2).sum(axis=0)_counts += np.sum(labels == 1)means = _sums / _countsstds = np.sqrt(_squared_sums / _counts - means ** 2)print('means:')print(means)print('stdevs:')print(stds)# map up to original set of anchors# 一开始是将超出图像范围的anchor直接丢掉的，现在在加回来labels = _unmap(labels, total_anchors, inds_inside, fill=-1)  # 这些anchor的label是-1，也即dontcarebbox_targets = _unmap(bbox_targets, total_anchors, inds_inside, fill=0)  # 这些anchor的真值是0，也即没有值bbox_inside_weights = _unmap(bbox_inside_weights, total_anchors, inds_inside, fill=0)  # 内部权重以0填充bbox_outside_weights = _unmap(bbox_outside_weights, total_anchors, inds_inside, fill=0)  # 外部权重以0填充if DEBUG:print('rpn: max max_overlap', np.max(max_overlaps))print('rpn: num_positive', np.sum(labels == 1))print('rpn: num_negative', np.sum(labels == 0))_fg_sum += np.sum(labels == 1)_bg_sum += np.sum(labels == 0)_count += 1print('rpn: num_positive avg', _fg_sum / _count)print('rpn: num_negative avg', _bg_sum / _count)# labelslabels = labels.reshape((1, height, width, A))  # reshap一下label  A = _num_anchors  # 10个anchorrpn_labels = labels# bbox_targetsbbox_targets = bbox_targets \.reshape((1, height, width, A * 4))  # reshaperpn_bbox_targets = bbox_targets# bbox_inside_weightsbbox_inside_weights = bbox_inside_weights \.reshape((1, height, width, A * 4))rpn_bbox_inside_weights = bbox_inside_weights# bbox_outside_weightsbbox_outside_weights = bbox_outside_weights \.reshape((1, height, width, A * 4))rpn_bbox_outside_weights = bbox_outside_weightsif DEBUG:print("anchor target set")return rpn_labels, rpn_bbox_targets, rpn_bbox_inside_weights, rpn_bbox_outside_weights

其中使用到的bbox_transform(ex_rois, gt_rois)函数，主要作用是计算预测框和gt框之间的偏移。

def bbox_transform(ex_rois, gt_rois):"""computes the distance from ground-truth boxes to the given boxes, normed by their size:param ex_rois: n * 4 numpy array, given boxes:param gt_rois: n * 4 numpy array, ground-truth boxes:return: deltas: n * 4 numpy array, ground-truth boxes"""ex_widths = ex_rois[:, 2] - ex_rois[:, 0] + 1.0ex_heights = ex_rois[:, 3] - ex_rois[:, 1] + 1.0ex_ctr_x = ex_rois[:, 0] + 0.5 * ex_widthsex_ctr_y = ex_rois[:, 1] + 0.5 * ex_heightsassert np.min(ex_widths) > 0.1 and np.min(ex_heights) > 0.1, \'Invalid boxes found: {} {}'.format(ex_rois[np.argmin(ex_widths), :], ex_rois[np.argmin(ex_heights), :])gt_widths = gt_rois[:, 2] - gt_rois[:, 0] + 1.0gt_heights = gt_rois[:, 3] - gt_rois[:, 1] + 1.0gt_ctr_x = gt_rois[:, 0] + 0.5 * gt_widthsgt_ctr_y = gt_rois[:, 1] + 0.5 * gt_heights# warnings.catch_warnings()# warnings.filterwarnings('error')targets_dx = (gt_ctr_x - ex_ctr_x) / ex_widthstargets_dy = (gt_ctr_y - ex_ctr_y) / ex_heightstargets_dw = np.log(gt_widths / ex_widths)targets_dh = np.log(gt_heights / ex_heights)targets = np.vstack((targets_dx, targets_dy, targets_dw, targets_dh)).transpose()return targets