融合超分辨率重建技术的多尺度目标检测算法

doi:10.19665/j.issn1001-2400.2023.03.012

摘要/Abstract

摘要：

目前大多数目标检测算法,由于尺度跨度较大而导致模型整体精确率和召回率不高,容易出现错检、漏检等现象。针对上述问题,提出一种融合超分辨率重建技术的多尺度目标检测算法。首先,算法以单阶段目标检测算法YOLO框架为基础,在颈部网络实现多尺度特征融合时加入超分辨率重建模块,避免进一步丢失较深层特征图中的细节特征。其次,使用通道注意力模块将较浅层特征图中的无关特征进行抑制,重点关注含有目标轮廓特征的通道信息,进一步增强浅层特征的表达能力。最后,在PASCAL VOC 2007和MS COCO 2017公开数据集上进行了消融实验和对比实验。实验结果表明,所提模块对检测性能有不同程度的提升,相比当前其他多尺度目标检测算法,所提算法在大、中、小三种尺度下目标平均精确率分别提升约1.20%、1.20%和1.30%,平均召回率分别提升约4.20%、3.50%和4.20%,算法整体检测性能得到进一步改善。

关键词: 多尺度目标检测, 超分辨率技术, 注意力机制, 深度学习

Abstract:

At present,most object detection algorithms have poor performance because of the large span of scales,leading to errors and omissions.To address the above issues,a multi-scale object detection algorithm combined with the super-resolution technology is proposed in this paper.First,based on the one-stage YOLO framework,the super-resolution module is employed to the neck network during the process of multi-scale feature fusion,which avoids further loss of detailed features in deeper layers.Second,the attention module is integrated in the shallower layers to focus on the channel information on object contour features and to suppress irrelevant features,thus improving the superficial representational capacity.Finally,ablation and comparative experiments are carried out on PASCAL VOC 2007 and MS COCO 2017 public datasets.Experimental results show that the proposed module can improve the detection performance.Compared with the current contrast algorithms,not only can the average accuracy rate of small,medium and large objects be increased by 1.20%,1.20% and 1.30%,but also the average recall rate can be improved by 4.20%,3.50% and 4.20%,respectively.

Key words: multi-scale object detection, super-resolution technology, attention mechanism, deep learning

中图分类号:

TP183

王娟,刘子杉,武明虎,陈关海,郭力权. 融合超分辨率重建技术的多尺度目标检测算法[J]. 西安电子科技大学学报, 2023, 50(3): 122-131.

WANG Juan,LIU Zishan,WU Minghu,CHEN Guanhai,GUO Liquan. Multi-scale object detection algorithm combined with super-resolution reconstruction technology[J]. Journal of Xidian University, 2023, 50(3): 122-131.

图/表 9

图1

图2

图3

表1

图4

表2

表3

表4

图5

参考文献 25

[1]	HAO Z, LIU Y, QIN H, et al. Scale-Aware Face Detection[C]//Computer Vision & Pattern Recognition. Piscataway:IEEE, 2017:1913-1922.
[2]	SINGH B, DAVIS L S. An Analysis of Scale Invariance in Object Detection-SNIP[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE, 2018:3578-3587.
[3]	SINGH B, NAJIBI M, DAVIS L S. SNIPER:Efficient Multi-Scale Training[C]// Neural Information Processing Systems. New York: ACM, 2018:9310-9320.
[4]	LIU W, ANGUELOV D, ERHAN D, et al. SSD:Single Shot MultiBox Detector[C]//European Conference on Computer Vision. Heidelberg:Springer, 2016:21-37.
[5]	CAI Z, FAN Q, FERIS R S, et al. A Unified Multi-Scale Deep Convolutional Neural Network for Fast Object Detection[C]//European Conference on Computer Vision. Heidelberg:Springer, 2016:354-370.
[6]	LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature Pyramid Networks for Object Detection[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE, 2017:2117-2125.
[7]	LIU S, QI L, QIN H, et al. Path Aggregation Network for Instance Segmentation[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE, 2018:8759-8768.
[8]	KONG T, SUN F, TAN C, et al. Deep Feature Pyramid Reconfiguration for Object Detection[C]//Proceedings of the European Conference on Computer Vision (ECCV). Heidelberg:Springer, 2018:169-185.
[9]	HE K, ZHANG X, REN S, et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition[J]. Pattern Analysis & Machine Intelligence IEEE Transactions on, 2015, 37(9):1904-1916.
[10]	REN S, HE K, GIRSHICK R, et al. Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, 39(6):1137-1149.
[11]	REDMON J, DIVVALA S, GIRSHICK R, et al. You Only Look Once:Unified,Real-Time Object Detection[C]//Computer Vision & Pattern Recognition. Piscataway:IEEE, 2016:779-788.
[12]	WANG C Y, LIAO H, WU Y H, et al. CSPNet:A New Backbone that can Enhance Learning Capability of CNN[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Piscataway:IEEE, 2020:1571-1580.
[13]	LI J, PEI Z, ZENG T. From Beginner to Master:A Survey for Deep Learning-Based Single-Image Super-Resolution (2021)[J/OL].[2021-09-29]. https://arxiv.org/abs/2109.14335.
[14]	XIAO J, TAO Z, YAO Y, et al. Context Augmentation and Feature Refinement Network for Tiny Object Detection[C]//Proceedings of the Tenth International Conference on Learning Representations. Virtual:ICLR, 2022:1-11.
[15]	SHI W, CABALLERO J, F HUSZÁR, et al. Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE, 2016:1874-1883.
[16]	LIM J S, ASTRID M, YOONH J, et al. Small Object Detection Using Context and Attention[C]//Proceedings of the 2021 International Conference on Artificial Intelligence in Information and Communication. Piscataway:IEEE, 2021:181-186.
[17]	WANG Q, WU B, ZHU P, et al. ECA-Net:Efficient Channel Attention for Deep Convolutional Neural Networks[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE, 2020:11531-11539.
[18]	刘迪, 郭继昌, 汪昱东, 张怡. 融合注意力机制的多尺度显著性目标检测网络[J]. 西安电子科技大学学报, 2022, 49(4):118-126.
	LIU Di, GUO Jichang, WANG Yudong, et al. Multi-Scale Salient Object Detection Network Combiningan Attention Mechanism[J]. Journal of Xidian University, 2022, 49(4):118-126.
[19]	HE J, ERFANI S, MA X, et al. Alpha-IoU:A Family of Power Intersection over Union Losses for Bounding Box Regression[J]. Advances in Neural Information Processing Systems, 2021, 34:20230-20242.
[20]	DE BOER P T, KROESE D P, MANNOR S, et al. A Tutorial on the Cross-Entropy Method[J]. Annals of Operations Research, 2005, 134(1):19-67. doi: 10.1007/s10479-005-5724-z
[21]	EVERINGHAM M. The PASCAL Visual Object Classes Challenge 2007(2007)[R/OL].[2009-12-31].// http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html.
[22]	LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO:Common Objects in Context[C]//European Conference on Computer Vision. Heidelberg:Springer, 2014:740-755.
[23]	DENG B, LV H. Survey of Target Detection Based on Neural Network[J]. Journal of Physics:Conference Series, 2021, 1952(2):022055. doi: 10.1088/1742-6596/1952/2/022055
[24]	REDMON J, FARHADI A. Yolov3:An Incremental Improvement (2018)[J/OL].[2018-04-08]. https://arxiv.org/abs/1804.02767.
[25]	GE Z, LIU S, WANG F, et al. Yolox:Exceeding Yolo Series in 2021 (2021)[J/OL].[2021-07-18]. https://arxiv.org/abs/2107.08430.

模型	参数量/M	GFLOPs	mAP₅₀/%	mAP₅₀₉₅/%
Baseline	8.95	26.68	79.85	56.92
Baseline+CSPCM	9.24	27.62	80.74	58.18
Baseline+CSPCM+ECA	9.24	27.62	81.13	58.44

结构	参数量/M	GFLOPs	mAP₅₀/%	mAP₅₀₉₅/%
图4(a)	9.50	28.46	80.87	58.13
图4(b)	9.31	27.86	80.65	57.94
图4(c)	9.39	28.09	80.67	58.12
图4(d)	9.24	27.62	80.74	58.18

算法		Backbone		参数量/M		AP₅₀/%		AP₇₅/%		AP₅₀₉₅/%
YOLOv3-tiny		Darknet-53		8.80		34.80				17.60
YOLOv3		Darknet-53		61.90		63.00				43.30
YOLOv5-s		Modified CSP v5		7.23		57.20		40.30		37.40
YOLOv5-m		Modified CSP v5		21.17		64.40		49.00		45.30
YOLOX-s		Modified CSP v5		8.97		59.40		42.90		40.00
YOLOX-m		Modified CSP v5		25.33		65.50		50.10		46.30
文中算法-s		Modified CSP v5		9.26		60.20		44.50		41.30
文中算法-m		Modified CSP v5		25.88		66.20		51.20		47.50
算法	FPS/(f·s^-1)		AP_S/%	AP_M/%	AP_L/%		AR_S/%		AR_M/%		AR_L/%
YOLOv3-tiny	140.10
YOLOv3	88.20
YOLOv5-s	156.30		21.20	42.30	49.10		37.80		62.50		72.20
YOLOv5-m	112.40		27.90	50.50	58.10		45.30		68.50		77.70
YOLOX-s	138.10		22.90	44.30	53.70		34.40		60.30		69.90
YOLOX-m	98.60		28.60	51.20	61.60		42.00		65.80		75.40
文中算法-s	134.40		24.10	45.50	55.00		38.60		63.80		74.10
文中算法-m	96.50		30.00	51.90	62.90		45.20		68.60		78.20

算法	mAP	鸟	船	瓶子	植物	椅子	飞机	自行车	公交车	小汽车	猫
YOLOv3-tiny	54.10	45.80	39.70	43.60	35.40	39.10	57.40	69.50	60.80	74.00	49.40
YOLOv5-s	76.50	74.00	64.00	66.90	51.60	58.70	87.50	85.40	82.10	89.90	80.10
YOLOX-s	79.85	77.49	72.32	70.15	59.39	62.59	87.75	87.54	86.06	89.21	82.33
文中算法-s	81.13	79.62	74.30	71.48	60.29	64.97	88.00	88.61	87.23	89.33	83.09
算法	牛	餐桌	狗	马	摩托车	人	羊	沙发	火车	电视监视器
YOLOv3-tiny	61.90	41.00	45.40	65.20	70.80	71.40	60.00	40.00	52.40	59.40
YOLOv5-s	80.90	67.50	78.30	88.0	82.90	86.40	75.00	70.90	84.40	75.70
YOLOX-s	83.40	76.10	81.63	89.02	86.20	86.53	80.12	74.84	84.30	80.03
文中算法-s	84.82	77.03	83.51	89.85	87.86	86.98	81.54	76.31	85.96	81.89