联合在线分类的双注意力RGBT孪生网络跟踪

doi:10.19665/j.issn1001-2400.2022.06.010

摘要/Abstract

摘要：

可见光和热红外成像机理不同,因此可以捕获的目标信息也不同。基于可见光和热红外的双模视觉跟踪器,可以综合利用两种模态内在的信息关联性和互补性,降低单模态信息的局限性和不确定性,提高视觉系统的鲁棒跟踪能力。针对现有算法中图像融合或特征拼接的方式不能充分挖掘可见光与红外图像的关联和互补信息等问题,设计了一种端到端学习的红外与可见光双模孪生网络跟踪器,网络同时学习可见光和热红外图像的深度特征,通过模态内与模态间的双注意力机制,对两种模态的特征进行自适应融合,最终实现可见光和热红外双模视觉跟踪;同时,针对孪生网络对目标与语义背景区分能力不足的问题,引入在线分类模块,通过分类器在线学习,减少干扰物对跟踪的影响,适应目标在跟踪过程中的变化。实验结果表明,所提算法能够有效地提高跟踪器的性能,在可见光与热红外跟踪基准数据集GTOT上的精确率和成功率分别约为90.6%和73.8%,分别比基线算法的提高了约5.5%和4.3%。故所提出的方法相比其他先进的跟踪算法,总体性能更好。

关键词: 目标跟踪, 可见光/热红外, 孪生网络, 注意力机制, 深度学习

Abstract:

The imaging mechanism of visible light and that of thermal infrared are different.Visible light and thermal infrared images reflect different information on the object.A dual-modal visual tracker based on visible light and thermal infrared sequences can comprehensively utilize the inherent correlation and complementarity of two modals,which reduces limitations and uncertainties of single-modal information,and improves the robustness of the visual tracking system.We propose an end-to-end dual-modal tracking algorithm with the Siamese network based on infrared and visible light sequences.The network learns the depth features from the visible light and thermal infrared frames at the same time,and then adaptively fuses the two-model features through intra-modal and cross-modal dual attention mechanisms,which leads to more robust tracking.At the same time,in view of the insufficiency of the Siamese network in distinguishing the target and semantic background,we incorporate the online classification module into the tracking framework.The online learned classifier reduces the interference and adapts to the target changes during tracking.According to experimental results,the proposed algorithm effectively improves the performance of the tracker.Its precision rate and success rate are 90.6% and 73.8% on the RGBT benchmark dataset GTOT,which are 5.5% and 4.3% higher than those of the baseline algorithm.The overall performance is better than that of other advanced tracking algorithms.

Key words: object tracking, RGB/Thermal infrared, Siamese network, attention mechanism, deep learning

中图分类号:

TP391

张兆宇,田春娜,周恒,田西兰. 联合在线分类的双注意力RGBT孪生网络跟踪[J]. 西安电子科技大学学报, 2022, 49(6): 76-85.

ZHANG Zhaoyu,TIAN Chunna,ZHOU Heng,TIAN Xilan. Online classification jointed RGBT tracking based on the dual attention Siamese network[J]. Journal of Xidian University, 2022, 49(6): 76-85.

图/表 9

图1

图2

图3

图4

图5

表1

图6

图7

表2

参考文献 36

[1]	YOUS, ZHU H, LI M, et al. A Review of Visual Trackers and Analysis of Its Application to Mobile Robot[EB/OL].[2021-10-02].arXiv:1910.09761,2019.
[2]	成磊, 王玥, 田春娜. 一种添加残差注意力机制的视觉目标跟踪算法[J]. 西安电子科技大学学报, 2020, 47(6):148-157.
	CHENG Lei, WANG Yue, TIAN Chunna. Residual Attention Mechanism for Visual Tracking[J]. Journal of Xidian University, 2020, 47(6):148-157.
[3]	HENRIQUES J F, CASEIRO R, MARTINS P, et al. High-Speed Tracking with Kernelized Correlation Filters[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(3):583-596. doi: 10.1109/TPAMI.2014.2345390 pmid: 26353263
[4]	BERTINETTO L, VALMADRE J, HENRIQUES J F, et al. Fully-Convolutional Siamese Networks for Object Tracking[C]// Proceedings of European Conference on Computer Vision Workshops.Heidelberg:Springer, 2016:850-865.
[5]	LI B, WU W, WANG Q, et al. SiamRPN++:Evolution of Siamese Visual Tracking with Very Deep Networks[C]// Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE Computer Society, 2019:4282-4291.
[6]	易翔, 王炳健. 视觉显著性指导的红外与可见光图像融合算法[J]. 西安电子科技大学学报, 2019, 46(1):27-32.
	YI Xiang, WANG Bingjian. Fusion of Infrared and Visual Images Guided by Visual Saliency[J]. Journal of Xidian University, 2019, 46(1):27-32.
[7]	ZHANG X, YE P, LEUNG H, et al. Object Fusion Tracking Based on Visible and Infrared Images:a Comprehensive Review[J]. Information Fusion, 2020, 63:166-187. doi: 10.1016/j.inffus.2020.05.002
[8]	YUN X, JING Z, XIAO G, et al. A Compressive Tracking Based on Time-Space Kalman Fusion Model[J]. Science China-Information Sciences, 2016, 59(1):1-15.
[9]	XIAO G, YUN X, WU J. A New Tracking Approach for Visible and Infrared Sequences Based on Tracking-Before-Fusion[J]. International Journal of Dynamics and Control, 2016, 4(1):40-51. doi: 10.1007/s40435-014-0115-4
[10]	ZHAI S, SHAO P, LIANG X, et al. Fast RGB-T Tracking via Cross-Modal Correlation Filters[J]. Neurocomputing, 2019, 334:172-181. doi: 10.1016/j.neucom.2019.01.022
[11]	YUN X, SUN Y, YANG X, et al. Discriminative Fusion Correlation Learning for Visible and Infrared Tracking[J]. Mathematical Problems in Engineering, 2019, 2019:1-11.
[12]	熊跃军, 张海涛, 邓黠. RGBT双模态加权相关滤波跟踪算法[J]. 信号处理, 2020, 36(9):1590-1597.
	XIONG Yuejun, ZHANG Haitao, DENG Xia. RGBT Dual-Modal Tracking with Weighted Discriminative Correlation Filters[J]. Journal of Singal Processing, 2020, 36(9):1590-1597.
[13]	宋建锋, 苗启广, 王崇晓, 等. 注意力机制的多尺度单目标跟踪算法[J]. 西安电子科技大学学报, 2021, 48(5):110-116.
	SONG Jianfeng, MIAO Qiguang, WANG Chongxiao, et al. Multi-Scale Single Object Tracking Based on the Attention Mechanism[J]. Journal of Xidian University, 2021, 48(5):110-116.
[14]	XU N, XIAO G, ZHANG X, et al. Relative Object Tracking Agorithm Based on Convolutional Neural Network for Visible and Infrared Video Sequences[C/OL].[2021-10-03]. https://xueshu.baidu.com/usercenter/paper/show?paperid=1c200e5061310vw04u680cb0y2513125&site.
[15]	LI C, LU A, ZHENG A, et al. Multi-Adapter RGBT Tracking[C]// Proceedings of the 2019 IEEE International Conference on Computer Vision Workshop.Piscataway:IEEE, 2019:2262-2270.
[16]	ZHU Y, LI C, LUO B, et al. Dense Feature Aggregation and Pruning for RGBT Tracking[C]// ACM Multimedia Conference. New York: ACM, 2019:465-472.
[17]	LU A, QIAN C, LI C, et al. Duality-Gated Mutual Condition Network for RGBT Tracking[EB/OL].[2021-09-28].arXiv:2011.07188,2020.
[18]	NAM H, HAN B. Learning Multi-Domain Convolutional Neural Networks for Visual Tracking[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE Computer Society, 2016:4293-4302.
[19]	CHEN Z, ZHONG B, LI G, et al. Siamese Box Adaptive Network for Visual Tracking[C]// Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE Computer Society, 2020:6667-6676.
[20]	ZHANG X, YE P, QIAO D, et al. Object Fusion Tracking Based on Visible and Infrared Images Using Fully Convolutional Siamese Networks[C/OL].[2021-09-30].Doi:10.23919/FUSION43075.2019.9011253.
[21]	ZHANG X, YE P, PENG S, et al. DSiamMFT:An RGB-T Fusion Tracking Method via Dynamic Siamese Networks Using Multi-Layer Feature Fusion[J]. Signal Processing Image Communication, 2020, 84:115756.
[22]	申亚丽. 基于特征融合的RGBT双模态孪生跟踪网络[J]. 红外与激光工程, 2021, 50(3):20200459.
	SHEN Yali. RGBT Dual-Modal Siamese Tracking Network with Feature Fusion[J]. Infrared And Laser Engineering, 2021, 50(3):20200459.
[23]	HU J, SHEN L, SUN G. Squeeze-and-Excitation Networks[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE Computer Society, 2018:7132-7141.
[24]	LIU Q, LU X, HE Z, et al. Deep Convolutional Neural Networks for Thermal Infrared Object Tracking[J]. Knowledge-Based Systems, 2017, 134:189-198. doi: 10.1016/j.knosys.2017.07.032
[25]	LI X, LIU Q, FAN N, et al. Hierarchical Spatial-Aware Siamese Network for Thermal Infrared Object Tracking[J]. Knowledge-Based Systems, 2019, 166:71-81. doi: 10.1016/j.knosys.2018.12.011
[26]	ZHU Z, WANG Q, LI B, et al. Distractor-Aware Siamese Networks for Visual Object Tracking[C/OL].[2021-10-04]. https://paperswithcode.com/paper/distractor-aware-siamese-networks-for-visual.
[27]	DANELLJAN M, GOUTAM B G, KHAN F S, et al. ATOM:Accurate Tracking by Overlap Maximization[C]// Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE Computer Society, 2019:4660-4669.
[28]	TROTTIER L, GIGUERE P, CHAIB-DRAA B. Parametric Exponential Linear Unit for Deep Convolutional Neural Networks[C/OL].[2021-10-10].DOI:10.48550/arXiv.1605.09332. doi: 10.48550/arXiv.1605.09332
[29]	BHAT G, JOHNANDER J, DANELLJAN M, et al. Unveiling the Power of Deep Tracking[EB/OL].[2021-10-05].arXiv:1804.06833v1.
[30]	LI C, LIANG X, LU Y, et al. RGB-T Object Tracking:Benchmark and Baseline[J]. Pattern Recognition, 2019, 96(12):106977.
[31]	LI C, CHENG H, HU S, et al. Learning Collaborative Sparse Representation for Grayscale-Thermal Tracking[J]. IEEE Transactions on Image Processing, 2016, 25(12):5743-5756. doi: 10.1109/TIP.2016.2614135 pmid: 28114068
[32]	ZHANG H, ZHANG L, ZHOU L, et al. Object Tracking in RGB-T Videos Using Modal-Aware Attention Network and Competitive Learning[J]. Sensors, 2020, 20(2):393. doi: 10.3390/s20020393
[33]	YANG R, ZHU Y, WANG X, et al. Learning Target-Oriented Dual Attention for Robust RGB-T Tracking[C]// Proceedings of 2019 IEEE International Conference on Image Processing.Piscataway:IEEE, 2019:3975-3979.
[34]	LI C, ZHAO N, LU Y, et al. Weighted Sparse Representation Regularized Graph Learning for RGB-T Object Tracking[C]// ACM International Conference on Multimedia. New York: ACM, 2017:1856-1864.
[35]	LI C, ZHU C, HUANG Y, et al. Cross-Modal Ranking with Soft Cnsistency and Nisy Lbels for Rbust RGB-T Tacking[C/OL].[2021-10-06]. https://paperswithcode.com/paper/cross-modal-ranking-with-soft-consistency-and.
[36]	ZHANG Z, PENG H. Deeper and Wder Samese Ntworks for Ral-time Vsual Tacking[C]// Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE Computer Society, 2019:4591-4600.

指标	λ₁=0.2 λ₂=0.8	λ₁=0.5 λ₂=0.5	λ₁=0.8 λ₂=0.2	λ₁=1 λ₂=0
PR	0.885	0.89	0.906	0.884
SR	0.719	0.723	0.738	0.718

算法	属性
算法	OCC	LSV	FM	LI	TC	SO	DEF	ALL
SiamFC^[4]	70.2/55.9	78.7/63.5	72.7/60.4	61.5/50.7	74.7/59.5	72.4/55.2	53.8/45.0	65.5/54.0
SiamDW^[36]	63.4/49.2	72.0/55.7	63.2/48.4	68.8/55.1	68.4/53.6	73.2/53.4	69.8/55.9	68.8/55.0
SiamDW^[36]+RGBT	67.5/53.6	68.9/56.5	71.1/57.6	70.0/58.8	63.5/51.7	76.4/58.8	69.1/58.2	68.0/56.5
MDNet^[18]+RGBT	82.9/64.1	77.0/57.3	80.5/59.8	79.5/64.3	79.5/60.9	87.0/62.2	81.6/68.8	80.0/63.7
MDNet^[18]	77.2/58.3	81.7/59.4	78.2/56.0	82.8/64.7	79.9/59.7	87.9/61.9	83.2/68.9	81.2/63.3
CMR^[34]	82.5/62.6	83.9/64.7	83.8/64.7	85.5/65.8	84.4/64.9	84.8/64.2	84.8/64.4	82.7/64.3
SGT^[35]	81.0/56.7	84.2/54.7	79.9/55.9	88.4/65.1	84.8/61.5	91.7/61.8	91.9/73.3	85.1/62.8
LTDA^[33]	84.6/63.5	84.8/64.4	84.8/64.2	85.1/66.3	85.3/66.0	86.7/66.3	86.9/67.6	84.3/67.7
DAPNet^[17]	87.3/67.4	86.0/66.1	85.2/65.3	86.9/67.7	87.5/68.0	88.6/68.2	89.1/69.6	88.2/70.7
MaCNet^[32]	87.6/68.7	84.6/67.3	82.3/65.9	89.4/73.1	89.2/69.7	95.0/69.5	92.6/76.5	88.0/71.4
文中算法	86.6/70.8	89.6/71.7	86.4/70.6	93.1/75.5	88.5/71.6	90.1/70.3	93.0/75.0	90.6/73.8