基于改进特征提取及融合模块的YOLOv3模型

doi:10.16180/j.cnki.issn1007-7820.2022.07.007

摘要/Abstract

摘要：

YOLOv3模型的特征提取分支和多尺度检测分支存在一定的优化空间。针对这一问题,文中提出了两种结构改进方法来提升该模型在目标检测数据集上的检测精度。对YOLOv3模型的3个尺度(13×13,26×26,52×52)之间采用不同长宽的先验锚框,其3个尺度的标注框相同,可通过设计尺度间的特征融合方式来提升模型的准确率。针对卷积层空域视野共享的问题,可将原始卷积层替换为可变形卷积来提升模型的准确率。在工业工具库上的测试结果证明,改进模型的测试集准确率相对于原始YOLOv3提升了3.6个MAP。

关键词: 目标检测, 深度学习, 多尺度融合, 工业工具检测, 残差模块, YOLOv3, IOU损失

Abstract:

There is a certain optimization space for the feature extraction branch and multi-scale detection branch of YOLOv3 model. To solve this problem, this study proposes two structural improvement methods to improve the detection accuracy of the model on the target detection data set. For the three scales (13×13, 26×26, 52×52) of the YOLOv3 model, a priori anchor frames of different lengths and widths are used, and the label frames of the three scales are the same, and the feature fusion method between the design scales is used to improve the accuracy of the model. In view of the problem of convolutional layer spatial view sharing, the original convolutional layer can be replaced with deformable convolution to improve the accuracy of the model. The test result on the industrial tool library proves that the accuracy of the test set of the improved model is increased by 3.6 MAP when compared with the original YOLOv3.

Key words: object detection, deep learning, multiscale fusion, industrial tool detection, residual module, YOLOv3, IOU loss

中图分类号:

TP274

赵轩,周凡,余汉成. 基于改进特征提取及融合模块的YOLOv3模型[J]. 电子科技, 2022, 35(7): 40-45.

ZHAO Xuan,ZHOU Fan,YU Hancheng. Improved YOLOv3 Model Based on New Feature Extraction and Fusion Module[J]. Electronic Science and Technology, 2022, 35(7): 40-45.

图/表 11

图1

图2

表1

表2

图3

表3

表4

图4

图5

表5

表6

参考文献 18

[1]	Krizhevsky A, Sutskever I, Hinton G. Imagenet classification with deep convolutional neural networks[J]. Advances in Neural Information Processing Systems, 2012, 25(2):1097-1105.
[2]	Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[C]. San Diego: Proceedings of the International Conference on Learning Representations, 2015.
[3]	He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]. Las Vegas: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.
[4]	Huang G, Liu Z, Van Der Maaten L, et al. Densely connected convolutional networks[C]. Honolulu: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.
[5]	Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions[C]. Columbus: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014.
[6]	Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]. Boston: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015.
[7]	Redmon J, Farhadi A. YOLO9000: better, faster, stronger[C]. Honolulu: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2017.
[8]	Choi J, Chun D, Kim H, et al. Gaussian YOLOv3: an accurate and fast object detector using localization uncertainty for autonomous driving[C]. Seoul: IEEE International Conference on Computer Vision, 2019.
[9]	Liu W, Anguelov D, Erhan D, et al. Ssd: single shot multibBox detector[C]. Amsterdam: Proceedings of the European Conference on Computer Vision, 2016.
[10]	缪冉, 李菲菲, 陈虬. 基于卷积神经网络与多尺度空间编码的场景识别方法[J]. 电子科技, 2020, 33(12):54-58.
	Miao Ran, Li Feifei, Chen Qiu. Scene recognition algorithm based on convolutional neural networks and multi-scale space encoding[J]. Electronic Science and Technology, 2020, 33(12):54-58.
[11]	He K, Zhang X, Ren S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[C]. Boston: Proceedings of the IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015.
[12]	Lin T, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]. Honolulu: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.
[13]	Luo C, Yu L, Yang E, et al. A benchmark image dataset for industrial tools[J]. Pattern Recognition Letters, 2019, 12(5):341-348.
[14]	Chen X, Li W, Wu Q, et al. Adaptive multi-scale information flow for object detection[C]. Newcastle: Proceedings of the British Machine Vision Conference, 2018.
[15]	Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions[C]. San Juan: International Conference on Learning Representations, 2016.
[16]	Jeon Y, Kim J. Active convolution: Learning the shape of convolution for image classification[C]. Honolulu: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.
[17]	Dai J, Qi H, Xiong Y, et al. Deformable convolutional networks[C]. Honolulu: Proceedings of the IEEE International Conference on Computer Vision, 2017.
[18]	Zhang X, Zhou X, Lin M, et al. Shufflenet: an extremely efficient convolutional neural network for mobile devices[C]. Salt Lake City: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.

实验环境	实验设定
(Batchsize)批次	4
图像分辨率	416×416
是否采用多尺度训练	是
输入通道数	3
动量	0.9
Decay	0.000 5
学习率	0.001
优化算法	Adam
数据集大小	11 765
数据集划分	0.9/0.1

Epoch	40	50	55	57	59	最佳
MAP1	63.04	73.00	78.00	56.00	76.50	83.40
MAP2	70.00	75.00	74.00	69.00	82.10	86.62

模型	MAP
Base-YOLOv3	83.40
Up-route	86.62
Up-route + DCN	86.98

模型	1	2	3
Class '0' (Cutting Tools)	86.77	98.29	97.52
Class '1'(Fastener Tools)	77.18	83.70	79.12
Class '2' (Adhesive Tools)	88.62	90.97	92.58
Class '3' (Measuring Tools)	87.76	93.12	91.56
Class '4' (Clamp Tools)	84.11	75.34	80.81
Class '5' (Marker)	79.43	76.44	77.22
Class '6' (Polish Tools)	69.52	76.78	82.67
Class '7' (Protection Tools)	94.26	98.18	94.40
平均	83.40	86.62	86.98

改进方式	结果
改进方法1	无法收敛
改进方法2	无法收敛
改进方法3	无法收敛
改进方法4	最佳精度无明显提升