注意力机制的SAR图像车辆目标检测网络

doi:10.19665/j.issn1001-2400.2023.01.005

摘要/Abstract

摘要：

在SAR图像车辆目标检测过程中,车辆轮廓定位不仅能够提供车辆位置信息,而且还能够为车辆状态分析提供依据,是SAR图像理解的关键步骤。但SAR图像中乘性斑点噪声会对轮廓定位造成干扰,增加车辆目标检测的难度。针对这一问题,提出了一种注意力机制的SAR图像像素级车辆目标检测网络。该网络由目标筛选、目标定位和轮廓细化三个模块构成。目标筛选在一个轻量级的特征提取网络中采用通道注意力和自注意力机制,在抑制噪声影响的同时对包含目标图像进行快速筛选,并提供稳定的定位热力图;目标定位利用掩码交叉注意力机制根据定位热力图优化粗尺度特征细化目标定位,并融入细尺度信息改善目标轮廓细节;轮廓细化通过轮廓点筛选消除上采样及噪声带来的轮廓不确定点获取准确的轮廓像素点置信度。对MSTAR数据集进行车辆像素级标注,建立SAR图像车辆数据集及大场景图像数据集用于网络测试。实验结果表明,该网络具有良好的像素级检测性能,可实现大场景SAR图像中车辆目标的快速精确检测。

关键词: 车辆目标检测, 深度学习, 注意力机制, 合成孔径雷达, 像素级目标检测

Abstract:

In the processing of vehicle-target detection in synthetic aperture radar (SAR) images,the contours of vehicles not only provide their position but also represent their condition,which is a key to SAR image understanding.But the multiplicative speckle noise in SAR images interferes with the border positioning of vehicles,resulting in difficulties for vehicle-target detection.To solve this problem,the present paper proposes an attention-mechanism-based neural network for pixel level vehicle detection,which consists of a target filtering module,a target locating module and a contour refining module.The target filtering module contains a lightweight feature extraction network with a channel-attention and self-attention mechanism to enhance feature expression.This module can decrease the effect of the speckle on features to select images containing the target quickly and precisely,and provide the output stable location heat map for the next module.The target locating module uses the foreground-background cross-attention mechanism to refine the coarse-scale features in accordance with the location heat map and refine the target location.Furthermore,the module adopts the fine-scale information to improve the details of the target contour.The contour refining module eliminates the contour uncertain points caused by upsampling and speckle noise to obtain accurate contour pixel confidence.For testing this network,a target image dataset and a large-scene image dataset are built with the pixel-level vehicle annotation of the dataset labeled by ourselves.The result of testing indicates that the network has a good pixel-level detection performance and can detect vehicle targets in large SAR images rapidly and accurately.

Key words: vehicle detection, deep learning, attention mechanism, synthetic aperture radar(SAR), pixel-level target detection

中图分类号:

TP391

张强, 杨欣朋, 赵世祥, 卫栋栋, 韩臻. 注意力机制的SAR图像车辆目标检测网络[J]. 西安电子科技大学学报, 2023, 50(1): 36-47.

ZHANG Qiang, YANG Xinpeng, ZHAO Shixiang, WEI Dongdong, HAN Zhen. Vehicle-target detection network for SAR images based on the attention mechanism[J]. Journal of Xidian University, 2023, 50(1): 36-47.

图/表 12

图1

图2

图3

图4

图5

图6

图7

图8

表1

图9

表2

图10

参考文献 21

[1]	许强, 李伟, 占荣辉, 等. 一种改进的卷积神经网络SAR目标识别算法[J]. 西安电子科技大学学报, 2018, 45(5):177-183.
	XU Qiang, LI Wei, ZHAN Ronghui, et al. Improved Algorithm for SAR Target Recognition Based on the Convolutional Neural Network[J]. Journal of Xidian University, 2018, 45(5):177-183.
[2]	黄勇, 刘芳. 场景语义SAR图像桥梁检测算法[J]. 西安电子科技大学学报, 2018, 45(4):40-44.
	HUANG Yong, LIU Fang. Detecting Water Bridge in SAR Images via A Scene Semantic Algorithm[J]. Journal of Xidian University, 2018, 45(4):40-44.
[3]	陈慧元, 刘泽宇, 郭炜炜, 等. 基于级联卷积神经网络的大场景遥感图像舰船目标快速检测方法[J]. 雷达学报, 2019, 8(3):413-424.
	CHEN Huiyuan, LIU Zeyu, GUO Yiyi, et al. Fast Detection of Ship Targets for Large-scale Remote Sensing Image Based on A Cascade Convolutional Neural Network[J]. Journal of Radars, 2019, 8(3):413-424.
[4]	崔艳鹏, 王元皓, 胡建伟. 一种改进YOLOv3的动态小目标检测方法[J]. 西安电子科技大学学报, 2020, 47(3):1-7.
	CUI Yanpeng, WANG Yuanhao, HU Jianwei. Detection Method for A Dynamic Small Target Using the Improved YOLOv3[J]. Journal of Xidian University, 2020, 47(3):1-7.
[5]	宋建锋, 苗启广, 王崇晓, 等. 注意力机制的多尺度单目标跟踪算法[J]. 西安电子科技大学学报, 2021, 48(5):110-116.
	SONG Jianfeng, MIAO Qiguang, WANG Chongxiao, et al. Multi-Scale Single Object Tracking Based on the Attention Mechanism[J]. Journal of Xidian University, 2021, 48(5):110-116.
[6]	董如婵, 焦李成, 赵进, 等. 一种深度融合机制的遥感图像目标检测技术[J]. 西安电子科技大学学报, 2021, 48(5):128-138.
	DONG Ruchan, JIAO Licheng, ZHAO Jin, et al. Application of the Deep Fusion Mechanism in Object Detection of Remote Sensing Images[J]. Journal of Xidian University, 2021, 48(5):128-138.
[7]	成磊, 王玥, 田春娜. 一种添加残差注意力机制的视觉目标跟踪算法[J]. 西安电子科技大学学报, 2020, 47(6):148-157.
	CHENG Lei, WANG Yue, TIAN Chunna. Residual Attention Mechanism for Visual Tracking[J]. Journal of Xidian University, 2020, 47(6):148-157.
[8]	LONG J, SHELHAMER E, DARRELL T. Fully Convolutional Networks for Semantic Segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 39(4):640-651. doi: 10.1109/TPAMI.2016.2572683
[9]	HE K, GKIOXARI G, DOLLAR P, et al. Mask R-CNN[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV 2017).Piscataway:IEEE, 2017:2980-2988.
[10]	KIRILLOV A, GIRSHICK R, HE K, et al. Panoptic Feature Pyramid Networks[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2019).Piscataway:IEEE, 2019:6392-6401.
[11]	HU J, SHEN L, SUN G. Squeeze-and-Excitation Networks[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2018).Piscataway:IEEE, 2018:7132-7141.
[12]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is All You Need (2020)[C/OL].[2020-05-25]. https://arxiv.org/abs/1706.03762.
[13]	WU Z, SU L, HUANG Q. Cascaded Partial Decoder for Fast and Accurate Salient Object Detection[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2019).Piscataway:IEEE, 2019:3907-3916.
[14]	SUN Y, CHEN G, ZHOU T, et al. Context-Aware Cross-level Fusion Network for Camouflaged Object Detection (2021)[C/OL].[2021-07-10]. https://arxiv.org/abs/2105.12555.
[15]	FAN D, JI G, SUN G, et al. Camouflaged Object Detection[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020).Piscataway:IEEE, 2020:2777-2787.
[16]	KIRILLOV A, GIRSHICK R, HE K, et al. Point Rend:Image Segmentation as Rendering[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020).Piscataway:IEEE, 2020:9799-9808.
[17]	WEI J, WANG S, HUANG Q. F3Net:Fusion,Feedback and Focus for Salient Object Detection[C]// Proceedings of the AAAI Conference on Artificial Intelligence (AAAI 2020). Palo Alto: AAAI, 2020:12321-12328.
[18]	PERAZZI F, KRAHENBUHL P, PRITCH Y, et al. Saliency Filters:Contrast Based Filtering for Salient Region Detection[C]// Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2012).Piscataway:IEEE, 2012:733-740.
[19]	FAN D, GONG C, CAO Y, et al. Enhanced-Alignment Measure for Binary Foreground Map Evaluation[C]// Proceedings of the 2017 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2017).Piscataway:IEEE, 2017:4548-4557.
[20]	MARGOLIN R, ZELINK-MANOR L, TAL A. How to Evaluate Foreground Maps[C]// Proceedings of the 2014 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR2014).Piscataway:IEEE, 2014:248-255.
[21]	FAN D, CHENG M, LIU Y, et al. Structure-measure:A New Way to Evaluate Fore-ground Maps[C]// Proceedings of the 2017 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2017).Piscataway:IEEE, 2017:4558-4567.

	MAE↓(×10^-1)	E-measure↑	wF-measure↑	S-measure↑
FCN	0.407	0.951 3	0.760 1	0.937 2
Mask rcnn	0.580	0.924 1	0.716 4	0.810 3
PFPN	0.229	0.956 9	0.888 7	0.949 0
CPD	0.200	0.971 4	0.907 6	0.944 5
C2FNet	0.173	0.976 9	0.910 2	0.957 0
VTD-Net	0.125	0.983 3	0.943 6	0.960 6

	MAE↓(×10^-3)	E-measure↑	wF-measure↑	S-measure↑	time↓/s
FCN	1.09	0.964 02	0.829 81	0.928 13	9.421
Mask rcnn	1.48	0.963 89	0.790 08	0.918 35	140.204
PFPN	0.99	0.966 00	0.874 51	0.939 39	27.003
CPD	0.81	0.986 42	0.893 11	0.947 27	67.627
C2FNet	0.73	0.984 03	0.909 48	0.951 68	33.484
VTD-Net	0.62	0.990 76	0.926 65	0.974 50	8.267