结合帧间目标回归网络的无人机视频车辆检测

doi:10.19665/j.issn1001-2400.2021.04.020

摘要/Abstract

摘要：

无人机视频具有视角灵活、视域连续、监视范围广等优点,但同时也存在目标分布密集、运动噪声强等问题,给准确的目标检测造成了困难。针对这些问题,提出结合帧间目标回归网络的无人机视频车辆检测算法。根据无人机视频中车辆目标密集分布的特点,提出软化非极大值抑制作为单阶段全卷积目标检测的检测框合并策略,进而构建单帧车辆检测器;为应对单帧检测器直接应用于视频检测时易受运动噪声干扰、造成同一目标置信度变化的问题,设计帧间目标回归网络,利用帧间运动连续性融合相邻多帧的目标特征,并与当前帧目标特征进行匹配回归输出预测结果;最后利用单帧检测结果修正,实现检测性能的提升。通过对已有无人机数据集进行筛选、融合和补充标注,构建一个更全面的无人机视频车辆数据集。该方法在数据集上的平均精度较单阶段全卷积目标检测和基于光流引导特征融合的视频目标检测分别提高约2%和5%,可达47.42%。实验结果表明,该方法优于单阶段全卷积目标检测和基于光流引导特征融合的视频目标检测等视频目标检测算法,具有更好的鲁棒性和泛化性。

关键词: 无人机视频, 车辆检测, 帧间运动, 特征融合, 帧间目标融合

Abstract:

UAV video has many advantages of flexible view,continuous view and wide monitoring scope,and at the same time,there are many problems,such as crowded targets,strong motion noises and so on,which make target detection difficult.To solve these problems,this paper proposes a video vehicle detection algorithm based on the interframe target regression network.According to the characteristics of crowded vehicles in UAV video,soft non maximum suppression is proposed as the detecting-box merging strategy of FCOS,and thus a single-frame vehicle detector is constructed.In order to deal with the problem that the single-frame detector can be easily disturbed by motion noise when it is directly applied to video detection,thus resulting in the change of the confidence level for the same target,an interframe target regression network is designed.The target features of adjacent multiple frames are fused by using interframe movement continuity,and the fused features are matched with the target features of the current frame to output the prediction results.Finally,the detection performance is improved by correcting prediction results through single-frame detection results.Compared with FCOS and FGFA,the average precision of the proposed algorithm is improved by 2% and 5% respectively,reaching 47.42%.Experimental results show that it is better than the existing FCOS and FGFA,and has better robustness and generalization.

Key words: UAV video, vehicle detection, interframe movements, fusion feature, interframe target regression

中图分类号:

TP391.4

张智,郑锦. 结合帧间目标回归网络的无人机视频车辆检测[J]. 西安电子科技大学学报, 2021, 48(4): 151-158.

ZHANG Zhi,ZHENG Jin. Interframe target regression network for vehicle detection in UAV video[J]. Journal of Xidian University, 2021, 48(4): 151-158.

图/表 6

图1

图2

图3

图4

表1

表2

参考文献 18

[1]	LI Q, MOU L, XU Q, et al. R3-Net:a Deep Network for Multioriented Vehicle Detection in Aerial Images and Videos[J]. IEEE Transactions on Geoscience and Remote Sensing, 2019, 57(7):5028-5042. doi: 10.1109/TGRS.36
[2]	REN S, HE K, GIRSHICK R, et al. Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6):1137-1149. doi: 10.1109/TPAMI.2016.2577031
[3]	XIE X, YANG W, CAO G, et al. Real-Time Vehicle Detection from UAV Imagery[C]//Proceedings of the 2018 IEEE Fourth International Conference on Multimedia Big Data(BigMM).Piscataway:IEEE, 2018:1-5.
[4]	LIU W, ANGUELOV D, ERHAN D, et al. SSD:Single Shot Multibox Detector[C]//Proceedings of the 14th European Conference on Computer Vision Amsterdam.Berlin:Springer, 2016:21-37.
[5]	LIN T Y, GOYAL P, GIRSHICK R, et al. Focal Loss for Dense Object Detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(2):318-327. doi: 10.1109/TPAMI.34
[6]	VAN ETTEN A. You Only Look Twice:Rapid Multi-Scale Object Detection In Satellite Imagery[EB/OL].[2018-05-24].https://arxiv.org/abs/1805.09512 .
[7]	崔艳鹏, 王元皓, 胡建伟. 一种改进YOLOv3的动态小目标检测方法[J]. 西安电子科技大学学报, 2020, 47(3):1-7.
	CUI Yanpeng, WANG Yuanhao, HU Jianwei. Detection Method for a Dynamic Small Target Using the Improved YOLOv3[J]. Journal of Xidian University, 2020, 47(3):1-7.
[8]	REDMON J, FARHADI A. YOLOV3:an Incremental Improvement[EB/OL].[2018-04-08].https://arxiv.org/abs/1804.02767 .
[9]	KANG K, LI H. T-CNN:Tubelets with Convolutional Neural Networks for Object Detection from Videos[EB/OL].[2017-08-03].https://arxiv.org/abs/1604.02532 .
[10]	ZHU X. Towards High Performance Video Object Detection[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition,Piscataway:IEEE, 2018:7210-7218.
[11]	WANG S, ZHOU Y, YAN J, et al. Fully Motion-Aware Network for Video Object Detection[C]//Proceedings of the European Conference on Computer Vision.Munich:Spronger, 2018:557-573.
[12]	BODLA N, SINGH, B, CHELLAPPA, R, et al. Soft-NMS:Improving Object Detection with One Line of Code[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision(ICCV).Piscataway:IEEE, 2017:5562-5570.
[13]	TIAN Z, SHEN C, CHEN H, et al. FCOS:Fully Convolutional One-Stage Object Detection[C]//Proceedings of the 2019 IEEE International Conference on Computer Vision(ICCV).Piscataway:IEEE, 2019:9626-9635.
[14]	HE K, ZHANG X, REN S. Deep Residual Learning for Image Recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Piscataway:IEEE, 2017:770-778.
[15]	LIN T, DOLLAR P, GIRSHICK R, et al. Feature Pyramid Networks for Object Detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Piscataway:IEEE, 2017:936-944.
[16]	ZHU P, WEN L, BIAN X, et al. Vision Meets Drones:a Challenge[EB/OL].[2018-04-23].https://arxiv.org/abs/1804.07437v2 .
[17]	HELD D, THRUN S, SAVARESE S. Learning to Track at 100 FPS with Deep Regression Networks[C]//Proceedings of the European Conference on Computer Vision Amsterdam.Berlin:Springer, 2016:749-765.
[18]	RAZAKARIVONY S, JURIE F. Vehicle Detection in Aerial Imagery:a Small Target Detection Benchmark[J]. Journal of Visual Communicationand Image Representation, 2016, 34:187-203.

方法	AP@0.5
FCOS	45.52
FCOS+Soft-NMS (单帧无人机图像检测器)	46.38
FCOS+Soft-NMS+ 帧间目标回归网络+融合1帧	47.30
FCOS+Soft-NMS+ 帧间目标回归网络+融合5帧	47.42
FCOS+Soft-NMS+ 帧间目标回归网络+融合10帧	47.10

方法	AP@0.5
Faster-Rcnn^[2](ResNet101+FPN)	44.82
RetinaNet^[5](ResNet101+FPN)	44.25
Yolov3^[8]	36.12
FCOS^[13] (ResNet101+FPN)	45.52
FGFA^[10] (视频检测器)	42.83
文中算法	47.42