nms_and_iou
Tim Chen(motion$) Lv5

IOU(Intersection of Union),交并比

  • 给定2个框(bounding box),计算其交集和并集的比例。
  • 通常bounding box的坐标表示有2种,第一种是2点坐标(x1, y1, x2, y2); 第二种是中心坐标和宽高(x, y, w, h)。在编写代码时我们会把第二种转换成2点坐标的形式进行计算。
  • code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# params: boxes1: [, [xmin, ymin, xmax, ymax]]
# params: boxes2: [num, [xmin, ymin, xmax, ymax]]
# returns: iou, [weights], length=len(boxes2)

def bboxes_iou(boxes1, boxes2):
boxes1 = np.array(boxes1)
boxes2 = np.array(boxes2)

# 计算各自面积
boxes1_area = (boxes1[..., 2] - boxes1[..., 0]) * (boxes1[..., 3] - boxes1[..., 1])
boxes2_area = (boxes2[..., 2] - boxes2[..., 0]) * (boxes2[..., 3] - boxes2[..., 1])

# 计算交集
left_up = np.maximum(boxes1[..., :2], boxes2[..., :2])
right_down = np.maximum(boxes1[..., 2:], boxes2[..., 2:])

inter_section = np.maximum(right_down - left_up, 0.0)
inter_area = inter_section[..., 0] * inter_section[..., 1]

# 计算并集
union_area = boxes1_area + boxes2_area - inter_area

# 计算IOU
iou = np.maximum(1.0 * inter_area / union_area, np.finfo(np.float32).eps)

return iou

NMS(Non-Maximum Suppression), 极大值抑制

  • 通常在物体检测(object detection)中,在head中预测出了很多的框,每个框包含了物体的类别,类别得分和坐标信息。因为在检测过程产生很多冗余的预测框,那么这时候我们就会用到NMS来去除这些冗余。具体做法是:
    • (1) 取得某一张图片上的某一类别的所有预测框
    • (2) 对其类别得分进行排序,取出得分最大的框做保留
    • (3) 拿得分最大的框作为基准(pivot),和剩余的框的坐标信息做IOU的计算,并将得到的IOU进行排序
    • (4) 设定一个iou_threshold,把IOU大于iou_threshold过滤出来进行冗余处理
    • (5) 冗余处理方案有2种,hard NMS和soft NMS,前者是直接把所有过滤的框的得分置为0,后者是利用高斯函数对其得分进行抑制或者减小,IOU越大的得分减少越多或者抑制程度越大,反之就减少越少或者抑制程度越小
    • (6) soft NMS是为了防止2个同类密集的框被误删其中得分小的框
  • code
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    # params: bboxes: (xmin, ymin, xmax, ymax, score, class)
    # params: iou_threshold, scale value
    # params: method, 'nms', 'soft-nms'

    # return: best_bboxes

    def nms(bboxes, iou_threshold, sigma=0.3, method='nms'):
    # 找出所有类别的唯一值
    classes_in_img = list(set(bboxes[:5]))

    # 保留的框
    best_bboxes = []

    # 按每个class进行for循环
    for cls in classes_in_img:
    # 首先过滤当前class的bboxes,得到cls_bboxes
    cls_mask = (bboxes[:5] == cls)
    cls_bboxes = bboxes[cls_mask]

    # NMS开始
    while len(cls_bboxes) > 0:
    # 按照score排序得到最大score的bboxes
    max_ind = np.argmax(cls_bboxes[:, 4])
    best_bbox = cls_bboxes[max_ind]

    # 从cls_bboxes中剔除最大score的bbox
    cls_bboxes = np.concatenate([cls_bboxes[:max_ind], cls_bboxes[max_ind+1:]])

    # 计算IOU
    iou = bboxes_iou(best_bbox[np.newaxis, : 4], cls_bboxes[:, :4])

    # 初始化weight
    weight = np.ones(len(iou), dtype=np.float32)

    assert method in ['nms', 'sfot-nms']

    # nms过滤冗余
    if method == 'nms':
    iou_mask = iou > iou_threshold
    # 把满足冗余的weight设为0
    weight[iou_mask] = 0.0

    # soft-nms
    if method == 'soft-nms':
    # 高斯函数
    weight = np.exp(-(1.0 * iou **2 / sigma))

    # 修改score的值
    cls_bboxes[:, 4] *= weight

    # 过滤score为0的预测框
    score_mask = cls_bboxes[:, 4] > 0
    cls_bboxes = cls_bboxes[score_mask]

    return best_bboxes

NMS另一种实现方法,from B站Bubbling..

  • (0) backbone输出的格式是,boxes: [bs, all_boxes, 4+1+num_classes]
  • (1) 先处理坐标转换
  • (2) 再拿出1张图片中的所有boxes,对conf_threshold进行过滤
  • (3) 处理每个box最后预测的结果num_classes,如共有20个分类,那么取20个分类中的max_score就是这个box预测的类别,在得到相应的下标index
  • (4) 按类别划分boxes,然后每个类别进行如下循环
    • 取得boxes的conf_score(正/背分类得分),按从大到小进行排序
    • 将得分最大的box保留,然后得分最大的box与剩余的boxes进行iou计算
    • nms_threshold就是iou得分的过滤,把iou得分高于nms_threshold的boxes去掉
    • 直至处理玩当前class的所有boxes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57

#coding=utf-8

import numpy as np

def nms(boxes, conf_threshold=0.5, nms_threshold=0.4):
# boxes: [bs, all_boxes, 4+1+num_classes]
bs = np.shape(boxes)[0]

# 将中心宽高转换成左上角和右下角的形式
shape_boxes = np.zeros_like(boxes[:,:,:4])
shape_boxes[:,:,0] = boxes[:,:,0] - boxes[:,:,2] / 2
shape_boxes[:,:,1] = boxes[:,:,1] - boxes[:,:,3] / 2
shape_boxes[:,:,2] = boxes[:,:,2] + boxes[:,:,2] / 2
shape_boxes[:,:,3] = boxes[:,:,3] + boxes[:,:,3] / 2

boxes[:,:,:4] = shape_boxes
output = []

for i in range(bs):
# 取每一张图片中的所有预测框prediction
prediction = boxes[i]
# 取出每个预测框的前景的二分类得分
score = prediction[:, 4]
# 过滤满足conf_threshold的框
mask = score > conf_threshold
detections = prediction[mask]

# 找出每个预测框box是属于哪个分类的,如80个分类中的哪一个分类label和score
class_conf = np.expand_dim(np.max(detections[:,5:], axis=-1), axis=-1)
class_pred = np.expand_dims(np.argmax(detections[:,5:], axis=-1), axis=-1)

detections = np.concatenate(detections[:,:,5], class_conf, class_pred, 1)

unique_class = np.unique(detections[:, -1])
if len(unique_class) == 0:
continue

best_box = []

for c in unique_class:
cls_mask = detection[:, -1] == c
detection = detections[cls_mask]
scores = detection[:, 4]

arg_sort = np.argsort(scores)[:,:,-1]
detection = detection[arg_sort]

while len(detection) != 0:
best_box.append(detection[0])
if len(detection) == 1:
break
ious = iou(detection[0], detection[1:])
detection = detection[1:][ious < nms_threshold]
output.append(best_box)
return np.array(output)

 评论