nms_and_iou | In Web3 | Truth

IOU(Intersection of Union)，交并比

给定2个框(bounding box)，计算其交集和并集的比例。
通常bounding box的坐标表示有2种，第一种是2点坐标(x1, y1, x2, y2); 第二种是中心坐标和宽高(x, y, w, h)。在编写代码时我们会把第二种转换成2点坐标的形式进行计算。
code

# params: boxes1: [, [xmin, ymin, xmax, ymax]]
# params: boxes2: [num, [xmin, ymin, xmax, ymax]]
# returns: iou, [weights], length=len(boxes2)

def bboxes_iou(boxes1, boxes2):
	boxes1 = np.array(boxes1)
	boxes2 = np.array(boxes2)
	
	# 计算各自面积
	boxes1_area = (boxes1[..., 2] - boxes1[..., 0]) * (boxes1[..., 3] - boxes1[..., 1])
	boxes2_area = (boxes2[..., 2] - boxes2[..., 0]) * (boxes2[..., 3] - boxes2[..., 1])
	
	# 计算交集
	left_up = np.maximum(boxes1[..., :2], boxes2[..., :2])
	right_down = np.maximum(boxes1[..., 2:], boxes2[..., 2:])
	
	inter_section = np.maximum(right_down - left_up, 0.0)
	inter_area = inter_section[..., 0] * inter_section[..., 1]
	
	# 计算并集
	union_area = boxes1_area + boxes2_area - inter_area
	
	# 计算IOU
	iou = np.maximum(1.0 * inter_area / union_area, np.finfo(np.float32).eps)
	
	return iou

NMS(Non-Maximum Suppression), 极大值抑制

通常在物体检测(object detection)中，在head中预测出了很多的框，每个框包含了物体的类别，类别得分和坐标信息。因为在检测过程产生很多冗余的预测框，那么这时候我们就会用到NMS来去除这些冗余。具体做法是：
- (1) 取得某一张图片上的某一类别的所有预测框
- (2) 对其类别得分进行排序，取出得分最大的框做保留
- (3) 拿得分最大的框作为基准(pivot)，和剩余的框的坐标信息做IOU的计算，并将得到的IOU进行排序
- (4) 设定一个iou_threshold，把IOU大于iou_threshold过滤出来进行冗余处理
- (5) 冗余处理方案有2种，hard NMS和soft NMS，前者是直接把所有过滤的框的得分置为0，后者是利用高斯函数对其得分进行抑制或者减小，IOU越大的得分减少越多或者抑制程度越大，反之就减少越少或者抑制程度越小
- (6) soft NMS是为了防止2个同类密集的框被误删其中得分小的框

code

# params: bboxes: (xmin, ymin, xmax, ymax, score, class)
# params: iou_threshold, scale value
# params: method, 'nms', 'soft-nms'

# return: best_bboxes

def nms(bboxes, iou_threshold, sigma=0.3, method='nms'):
	# 找出所有类别的唯一值
	classes_in_img = list(set(bboxes[:5]))
	
	# 保留的框
	best_bboxes = []
	
	# 按每个class进行for循环
	for cls in classes_in_img:
		# 首先过滤当前class的bboxes，得到cls_bboxes
		cls_mask = (bboxes[:5] == cls)
		cls_bboxes = bboxes[cls_mask]
		
		# NMS开始
		while len(cls_bboxes) > 0:
			# 按照score排序得到最大score的bboxes
			max_ind = np.argmax(cls_bboxes[:, 4])
			best_bbox = cls_bboxes[max_ind]
			
			# 从cls_bboxes中剔除最大score的bbox
			cls_bboxes = np.concatenate([cls_bboxes[:max_ind], cls_bboxes[max_ind+1:]])
			
			# 计算IOU
			iou = bboxes_iou(best_bbox[np.newaxis, : 4], cls_bboxes[:, :4])
			
			# 初始化weight
			weight = np.ones(len(iou), dtype=np.float32)
			
			assert method in ['nms', 'sfot-nms']
			
			# nms过滤冗余
			if method == 'nms':
				iou_mask = iou > iou_threshold
				# 把满足冗余的weight设为0
				weight[iou_mask] = 0.0
			
			# soft-nms		
			if method == 'soft-nms':
				# 高斯函数
				weight = np.exp(-(1.0 * iou **2 / sigma))
			
			# 修改score的值
			cls_bboxes[:, 4] *= weight
			
			# 过滤score为0的预测框
			score_mask = cls_bboxes[:, 4] > 0
			cls_bboxes = cls_bboxes[score_mask]
			
		return best_bboxes

NMS另一种实现方法，from B站Bubbling..

(0) backbone输出的格式是，boxes: [bs, all_boxes, 4+1+num_classes]
(1) 先处理坐标转换
(2) 再拿出1张图片中的所有boxes，对conf_threshold进行过滤
(3) 处理每个box最后预测的结果num_classes，如共有20个分类，那么取20个分类中的max_score就是这个box预测的类别，在得到相应的下标index
(4) 按类别划分boxes，然后每个类别进行如下循环
- 取得boxes的conf_score（正/背分类得分），按从大到小进行排序
- 将得分最大的box保留，然后得分最大的box与剩余的boxes进行iou计算
- nms_threshold就是iou得分的过滤，把iou得分高于nms_threshold的boxes去掉
- 直至处理玩当前class的所有boxes


#coding=utf-8

import numpy as np

def nms(boxes, conf_threshold=0.5, nms_threshold=0.4):
    # boxes: [bs, all_boxes, 4+1+num_classes]
    bs = np.shape(boxes)[0]

    # 将中心宽高转换成左上角和右下角的形式
    shape_boxes = np.zeros_like(boxes[:,:,:4])
    shape_boxes[:,:,0] = boxes[:,:,0] - boxes[:,:,2] / 2
    shape_boxes[:,:,1] = boxes[:,:,1] - boxes[:,:,3] / 2
    shape_boxes[:,:,2] = boxes[:,:,2] + boxes[:,:,2] / 2
    shape_boxes[:,:,3] = boxes[:,:,3] + boxes[:,:,3] / 2

    boxes[:,:,:4] = shape_boxes
    output = []

    for i in range(bs):
        # 取每一张图片中的所有预测框prediction
        prediction = boxes[i]
        # 取出每个预测框的前景的二分类得分
        score = prediction[:, 4]
        # 过滤满足conf_threshold的框
        mask = score > conf_threshold
        detections = prediction[mask]

        # 找出每个预测框box是属于哪个分类的，如80个分类中的哪一个分类label和score
        class_conf = np.expand_dim(np.max(detections[:,5:], axis=-1), axis=-1)
        class_pred = np.expand_dims(np.argmax(detections[:,5:], axis=-1), axis=-1)

        detections = np.concatenate(detections[:,:,5], class_conf, class_pred, 1)

        unique_class = np.unique(detections[:, -1])
        if len(unique_class) == 0:
            continue

        best_box = []

        for c in unique_class:
            cls_mask = detection[:, -1] == c
            detection = detections[cls_mask]
            scores = detection[:, 4]

            arg_sort = np.argsort(scores)[:,:,-1]
            detection = detection[arg_sort]

            while len(detection) != 0:
                best_box.append(detection[0])
                if len(detection) == 1:
                    break
                ious = iou(detection[0], detection[1:])
                detection = detection[1:][ious < nms_threshold]
        output.append(best_box)
    return np.array(output)