基于卷积与自注意力聚合的小目标检测

doi:10.16180/j.cnki.issn1007-7820.2024.02.003

摘要/Abstract

摘要：

在多数目标检测公开数据集中,小目标检测是一个研究热点。针对检测器在多尺寸检测场景下对小目标检测精度不足的问题,文中提出基于YOLOv5s(You Only Look Once version 5s)的小目标检测改进模型。模型在检测器的特征提取网络中加入卷积自注意力聚合残差块来提升特征提取能力,同时从浅层网络中引入新的特征图增强小目标的特征信息,改进特征融合网络结构,以便充分利用新引入的浅层特征。引入SIOU Loss替换原GIOU Loss矩形框损失函数,提升检测精度和训练速度。实验结果表明,在PASCAL VOC的2007和2012数据集上,改进模型检测精度比YOLOv5s提高0.012,小目标检测精度比YOLOv5s提高0.023;在MS COCO数据集上改进模型比YOLOv5s的检测精度提高0.001,小目标检测精度比YOLOv5s提高0.009。

关键词: 小目标, 目标检测, YOLOv5s, 卷积神经网络, 自注意力, ACmix, SIOU Loss, 残差网络

Abstract:

Small object detection is a research hotspot in most object detection open datasets. In view of the problem of insufficient detection accuracy of small targets in multi-size detection scenarios, an improved small target detection model based on YOLOv5s(You Only Look Once version 5s) is proposed in this study.A convolution self-attention aggregation residual block is added to the feature extraction network of the detector to improve the feature extraction ability, and a new feature graph is introduced from the shallow network to enhance the feature information of small object. The feature fusion network structure is improved to make full use of the newly introduced shallow features. SIOU Loss is introduced to replace the original GIOU Loss rectangular frame loss function to improve the detection accuracy and training speed.The experimental results show that the detection accuracy of the improved model is 0.012 higher than YOLOv5s on the 2007 and 2012 data sets of PASCAL VOC, and the small object detection accuracy is 0.023 higher than YOLOv5s. The detection accuracy of the imporved model in MS COCO data set is 0.001 higher than YOLOv5s, and the detection accuracy of small objects is 0.009 higher than YOLOv5s.

Key words: small object, object detection, YOLOv5s, convolutional neural network, self-attention, ACmix, SIOU Loss, residual network

中图分类号:

TN247

王小铸,于莲芝. 基于卷积与自注意力聚合的小目标检测[J]. 电子科技, 2024, 37(2): 14-22.

WANG Xiaozhu,YU Lianzhi. Small Object Detection Based on Convolution and Self-Attention of Aggregation[J]. Electronic Science and Technology, 2024, 37(2): 14-22.

图/表 17

图1

图2

图3

图4

图5

图6

图7

图8

图9

图10

表1

表2

表3

图11

图12

表4

表5

参考文献 25

[1]	Gu J X, Wang Z H, Kuen J, et al. Recent advances in convolutional neural networks[J]. Pattern Recognition:The Journal of the Pattern Recognition Society, 2018, 77(9):354-377.
[2]	Lawrence Z, Piotr D. Edge boxes:Locating object proposals from edges[C]. Zurich: European Conference on Computer Vision, 2014:162-169.
[3]	Ren S, He K, Girshick R, et al. Faster R-CNN:Towards real-time object detection with region proposal networks[C]. Montreal: Proceedings of Advances in Neural Information Processing Systems, 2015:805-812.
[4]	程旭, 宋晨, 郑钰辉. 基于深度学习的通用目标检测研究综述[J]. 电子学报, 2021, 49(7):1428-1438. doi: 10.12263/DZXB.20200570
	Cheng Xu, Song Chen, Zheng Yuhui. A survey of generic object detection methods based on deep learning[J]. Acta Electronica Sinica, 2021, 49(7):1428-1438. doi: 10.12263/DZXB.20200570
[5]	Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]. Columbus: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014:78-87.
[6]	Girshick R. Fast R-CNN[C]. Santiago: International Conference on Computer Vision, 2015:366-378.
[7]	Sermanet P, Eigen D, Zhang X, et al. OverFeat:Integrated recognition,localization and detection using convolutional networks[C]. Scottsdale: International Conference on Learning Representations, 2013:264-275.
[8]	Redmon J, Divvala S, Girshick R, et al. You only look once: Unified,real-time object detection[C]. Boston: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016:597-605.
[9]	Liu W, Anguelov D, Erhan D, et al. SSD:Single shot multi box detector[C]. Amsterdam: Proceedings of European Conference on Computer Vision, 2016:369-378.
[10]	Redmon J, Farhadi A. YOLO9000:Better,faster, stronger[C]. Honolulu: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017:1190-1230.
[11]	Redmon J, Farhadi A. YOLOv3:An incremental improvement[C]. Wellington: IEEE Conference on Computer Vision and Pattern Recognition, 2018:752-768.
[12]	Lim J S, Astrid M, Yoon H J, et al. Small object detection using context and attention[C]. Jeju island: International Conference on Artificial Intelligence in Information and Communication, 2021:593-599.
[13]	郭磊, 王邱龙, 薛伟. 基于改进YOLOv5的小目标检测算法[J]. 电子科技大学学报, 2022, 51(2):251-258.
	Guo Lei, Wang Qiulong, Xue Wei. A small object detection algorithm based on improved YOLOv5[J]. Journal of University of Electronic Science and Technology of China, 2022, 51(2):251-258.
[14]	邱天衡, 王玲, 王鹏. 基于改进YOLOv5的目标检测算法研究[J]. 计算机工程与应用, 2022, 58(13):63-73. doi: 10.3778/j.issn.1002-8331.2202-0093
	Qiu Tianheng, Wang Ling, Wang Peng. Research on object detection algorithm based on improved YOLOv5[J]. Computer Engineering and Applications, 2022, 58(13):63-73. doi: 10.3778/j.issn.1002-8331.2202-0093
[15]	张寅, 朱桂熠, 施天俊. 基于特征融合与注意力的遥感图像小目标检测[J]. 光学学报, 2022, 42(24):140-150.
	Zhang Yin, Zhu Guiyi, Shi Tianjun. Small object detection in remote sensing images based on feature fusionand attention[J]. Acta Optica Sinica, 2022, 42(24):140-150.
[16]	Lin T Y, Maire M, Belongie S, et al. Microsoft COCO: Common objects in context[J]. European Conference on Computer Vision, 2014(4):740-755.
[17]	李昂, 孙士杰, 张朝阳. 改进YOLOv5s的轨道障碍物检测模型轻量化研究[J]. 计算机工程与应用, 2023, 59(4):197-207. doi: 10.3778/j.issn.1002-8331.2208-0045
	Li Ang, Sun Shijie, Zhang Chaoyang. Research on lightweight of improved YOLOv5 track obstacle detectio model[J]. Computer Engineering and Applications, 2023, 59(4):197-207. doi: 10.3778/j.issn.1002-8331.2208-0045
[18]	Wang C Y, Liao H Y M, Wu Y H, et al. CSPNet:A new backbone that can enhance learning capability of CNN[C]. Seattle: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020:198-206.
[19]	Lin T Y, Dollar P, Girshick R, et al. Feature pyramid networks for object detection[C]. Honolulu: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017:357-368.
[20]	Liu S, Qi L, Qin H, et al. Path aggregation network for instance segmentation[C]. Salt Lake City: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018:506-522.
[21]	Pan X, Ge C, Lu R, et al. On the integration of self-attention and convolution[EB/OL].(2021-11-29) [2022-09-26] https://arxiv.org/abs/2111.14556.
[22]	He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]. Las Vegas: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016:778-789.
[23]	Rezatofighi H, Tsoi N, Gwak J Y, et al. Generalized intersection over union:A metric and a loss for bounding box regression[C]. Long Beach: Computer Vision and Pattern Recognition, 2019:3012-3022.
[24]	Zhora G. SIoU Loss:More powerful learning for bounding box regression[EB/OL].(2022-05-25) [2022-09-26] https://arxiv.org/abs/2205.12740.
[25]	Everingham M, Eslami S M A, Gool L V, et al. The pascal, visual object classes challenge:A retrospective[J]. International Journal of Computer Vision, 2015, 111(1):98-136. doi: 10.1007/s11263-014-0733-5

模型	输入尺寸	参数量/MB	mAP50	mAP50∶95
SSD	300×300	26.285	0.783	0.470
YOLOv3	416×416	61.626	0.851	0.583
YOLOv4-tiny	416×416	5.918	0.781	0.403
YOLOv4	416×416	64.040	0.880	0.602
YOLOv5s	640×640	7.115	0.860	0.591
本文	640×640	9.279	0.872	0.599

模型	输入尺寸	运算量/GB	mAP50	mAP50∶95
SSD	640×640	282.197	0.742	0.408
YOLOv3	640×640	155.404	0.806	0.444
YOLOv4-tiny	640×640	16.216	0.676	0.308
YOLOv4	640×640	141.766	0.775	0.430
YOLOv5s	640×640	16.541	0.860	0.591
本文	640×640	23.040	0.872	0.599

模型	输入尺寸	mAP50	Small(mAP50∶95)
SSD	640×640	0.742	0.265
YOLOv3	640×640	0.806	0.362
YOLOv4-tiny	640×640	0.676	0.269
YOLOv4	640×640	0.775	0.385
YOLOv5s	640×640	0.860	0.390
本文	640×640	0.872	0.413

模型	ResAC	4-FPN+PAN	SIOU Loss	mAP50
模型1	√			0.865
模型2		√		0.830
模型3			√	0.862
模型4	√	√		0.865
模型5	√		√	0.868
模型6	√	√	√	0.872

模型	输入尺寸	mAP50	mAP50∶95	Small(mAP50∶95)
YOLOv5s	640×640	0.539	0.356	0.206
本文	640×640	0.549	0.355	0.215