用于语义分割的自监督对比式表征学习

doi:10.19665/j.issn1001-2400.20230304

Abstract

Abstract:

To improve the accuracy of the semantic segmentation models and avoid the labor and time costs of pixel-wise image annotation for large-scale semantic segmentation datasets,this paper studies the pre-training methods of self-supervised contrastive representation learning,and designs the Global-Local Cross Contrastive Learning(GLCCL) method based on the characteristics of the semantic segmentation task.This method feeds global images and a series of image patches after local chunking into the network to extract global and local visual representations respectively,and guides the network training by constructing loss function that includes global contrast,local contrast,and global-local cross contrast,enabling the network to learn both global and local visual representations as well as cross-regional semantic correlations.When using this method to pre-train BiSeNet and transfer to the semantic segmentation task,compared with the existing self-supervised contrastive representational learning and supervised pre-training methods,the performance improvement of 0.24% and 0.9% mean intersection over union(MIoU) is achieved.Experimental results show that this method can improve the segmentation results by pre-training the semantic segmentation model with unlabeled data,which has a certain practical value.

Key words: semantic segmentation, self-supervised representation learning, contrastive learning, deep learning

CLC Number:

TP391.4

LIU Bochong, CAI Huaiyu, WANG Yi, CHEN Xiaodong. Self-supervised contrastive representation learning for semantic segmentation[J].Journal of Xidian University, 2024, 51(1): 125-134.

Figures/Tables 11

References 25

[1]	DENG J, DONG W, SOCHER R, et al. Imagenet:A Large-Scale Hierarchical Image Database[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2009:248-255.
[2]	LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO:Common Objects in Context[C]//European Conference on Computer Vision. Berlin: Springer, 2014:740-755.
[3]	CORDTS M, OMRAN M, RAMOS S, et al. The Cityscapes Dataset for Semantic Urban Scene Understanding[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016:3213-3223.
[4]	周鹏, 杨军. 采用神经网络架构搜索的遥感影像分割方法[J]. 西安电子科技大学学报, 2021, 48(5):47-57.
	ZHOU Peng, YANG Jun. Remote Sensing Image Segmentation Method Using Neural Network Architecture Search[J]. Journal of Xidian University, 2021, 48(5):47-57.
[5]	LIU X, ZHANG F, HOU Z, et al. Self-Supervised Learning:Generative or Contrastive[J]. IEEE Transactions on Knowledge and Data Engineering, 2021, 35(1):857-876.
[6]	LI Y, HU P, LIU Z, et al. Contrastive Clustering[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(10):8547-8555. doi: 10.1609/aaai.v35i10.17037
[7]	LIU S, LI Z, SUN J. Self-EMD:Self-Supervised Object Detection without ImageNet(2020)[J/OL].[2022-01-01].https://arxiv.org/abs/2011.13677v3.
[8]	WEI F, GAO Y, WU Z, et al. Aligning Pretraining for Detection via Object-Level Contrastive Learning[J]. Advances in Neural Information Processing Systems, 2021, 34:22682-22694.
[9]	CARON M, MISRA I, MAIRAL J, et al. Unsupervised Learning of Visual Features by Contrasting Cluster Assignments[J]. Advances in Neural Information Processing Systems, 2020, 33:9912-9924.
[10]	史家辉, 郝小慧, 李雁妮. 一种高效的自监督元迁移小样本学习算法[J]. 西安电子科技大学学报, 2021, 48(6):48-56.
	SHI Jiahui, HAO Xiaohui, LI Yanni. A Highly Efficient Self-Supervised Meta Transfer Small Sample Learning Algorithm[J]. Journal of Xidian University, 2021, 48(6):48-56.
[11]	VAHDAT A, KAUTZ J. NVAE:A Deep Hierarchical Variational Autoencoder[J]. Advances in Neural Information Processing Systems, 2020, 33:19667-19679.
[12]	GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative Adversarial Networks[J]. Communications of the ACM, 2020, 63(11):139-144. doi: 10.1145/3422622
[13]	王军军, 孙岳, 李颖. 一种生成对抗网络的遥感图像去云方法[J]. 西安电子科技大学学报, 2021, 48(5):23-29.
	WANG Junjun, SUN Yue, LI Yin. A Remote Sensing Image Declouding Method for Generating Adversarial Networks[J]. Journal of Xidian University, 2021, 48(5):23-29.
[14]	须颖, 刘帅, 邵萌, 等. 一种多尺度GAN的低剂量CT超分辨率重建方法[J]. 西安电子科技大学学报, 2022, 49(2):228-236.
	XU Yin, LIU Shuai, SHAO Meng, et al. A Multi-Scale GAN-Low Dose CT Super-Resolution Reconstruction Method[J]. Journal of Xidian University, 2022, 49(2):228-236.
[15]	CHEN T, KORNBLITH S, NOROUZI M, et al. A Simple Framework for Contrastive Learning of Visual Representations[C]//International Conference on Machine Learning. San Diego: ICML, 2020:1597-1607.
[16]	HE K, FAN H, WU Y, et al. Momentum Contrast for Unsupervised Visual Representation Learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020:9726-9735.
[17]	TIAN Y, KRISHNAN D, ISOLA P. Contrastive Multiview Coding[C]//European Conference on Computer Vision. Berlin: Springer, 2020:776-794.
[18]	CHEN X, FAN H, GIRSHICK R, et al. Improved Baselines with Momentum Contrastive Learning(2020)[J/OL].[2022-01-01].https://arxiv.org/abs/2003.04297.
[19]	GRILL J B, STRUB F, ALTCHÉ F, et al. Bootstrap Your Own Latent:A New Approach to Self-Supervised Learning[J]. Advances in Neural Information Processing Systems, 2020, 33:21271-21284.
[20]	CHEN X, HE K. Exploring Simple Siamese Representation Learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021:15745-15753.
[21]	YU C, WANG J, PENG C, et al. BiSeNet:Bilateral Segmentation Network for Real-time Semantic Segmentation[C]// Proceedings of the European Conference on Computer Vision(ECCV). 2018:334-349.
[22]	WU T, TANG S, ZHANG R, et al. CGNet:A Light-Weight Context Guided Network for Semantic Segmentation[J]. IEEE Transactions on Image Processing, 2021, 30:1169-1179. doi: 10.1109/TIP.83
[23]	刘博翀, 蔡怀宇, 杨诗远, 等. 一种用于自动驾驶场景的轻量级语义分割网络[J]. 西安电子科技大学学报, 2023, 50(1):118-128.
	LIU Bochong, CAI Huaiyu, YANG Shiyuan, et al. Lightweight Semantic Segmentation Network for Automatic Driving Scenarios[J]. Journal of Xidian University, 2023, 50(1):118-128.
[24]	LOSHCHILOV I, HUTTER F. SGDR:Stochastic Gradient Descent with Warm Restarts(2017)[J/OL].[2022-01-01].https://arxiv.org/abs/1608.03983v3.
[25]	LOSHCHILOV I, HUTTER F. Decoupled Weight Decay Regularization(2017)[J/OL].[2022-01-01].https://arxiv.org/abs/1711.05101v1.

BiSeNet		CGNet		LMBANet
PatchNum	MIoU/%	PatchNum	MIoU/%	PatchNum	MIoU/%
2×2	69.31	2×2	65.94	2×2	72.49
4×4	69.90	4×4	66.30	4× 4	72.91
8×8	69.34	8×8	66.28	8×8	72.89

预训练方法	MIoU/%
无预训练	68.40
有监督预训练	69.00_+0.6
MoCo	69.66_+1.26
GLCCL	69.90_+1.50

预训练方法	MIoU/%
无预训练	64.80
有监督预训练	65.99_+1.19
MoCo	65.44_+0.64
GLCCL	66.30_+1.50

预训练方法	MIoU/%
无预训练	71.86
有监督预训练	72.47_+0.61
MoCo	72.23_+0.37
GLCCL	72.91_+1.05

Self-supervised contrastive representation learning for semantic segmentation

RichHTML

PDF (PC)

Like

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 11

References 25

Related Articles 15

Metrics

Comments

Recommended 0

[1]	DING Xinmiao, WANG Jiaxing, GUO Wen. Three-dimensional attention-enhanced algorithm for violence scene detection [J]. Journal of Xidian University, 2024, 51(1): 114-124.
[2]	XIONG Jingwei, PAN Jifei, BI Daping, DU Mingyang. Multi-scale convolutional attention network for radar behavior recognition [J]. Journal of Xidian University, 2023, 50(6): 62-74.
[3]	HOU Yue,ZHENG Xin,HAN Chengyan. Traffic flow prediction method for integrating longitudinal and horizontal spatiotemporal characteristics [J]. Journal of Xidian University, 2023, 50(5): 65-74.
[4]	FAN Wentong,LI Zhenyu,ZHANG Tao,LUO Xiangyang. JPEG image steganalysis based on deep extraction of stego noise [J]. Journal of Xidian University, 2023, 50(4): 157-169.
[5]	WANG Yuhua,GAO Sheng,ZHU Jianming,HUANG Chen. Efficient deep learning scheme with adaptive differential privacy [J]. Journal of Xidian University, 2023, 50(4): 54-64.
[6]	WANG Juan,LIU Zishan,WU Minghu,CHEN Guanhai,GUO Liquan. Multi-scale object detection algorithm combined with super-resolution reconstruction technology [J]. Journal of Xidian University, 2023, 50(3): 122-131.
[7]	XIE Wen,HUA Wenqiang,JIAO Licheng,WANG Ruonan. Review on polarimetric SAR terrain classification methods using deep learning [J]. Journal of Xidian University, 2023, 50(3): 151-170.
[8]	ZHOU Shuo,ZHOU Yiqing,ZHANG Chong,XING Wang. ResNet enabled joint channel estimation and signal detection for OTFS [J]. Journal of Xidian University, 2023, 50(3): 19-30.
[9]	WANG Keyan,CHENG Jicong,HUANG Shirui,CAI Kunlun,WANG Weiran,LI Yunsong. Low-light image dehazing network with aggregated context-aware attention [J]. Journal of Xidian University, 2023, 50(2): 23-32.
[10]	LIU Bochong, CAI Huaiyu, YANG Shiyuan, LI Haotian, WANG Yi, CHEN Xiaodong. Lightweight semantic segmentation network for autonomous driving scenarios [J]. Journal of Xidian University, 2023, 50(1): 118-128.
[11]	ZHANG Qiang, YANG Xinpeng, ZHAO Shixiang, WEI Dongdong, HAN Zhen. Vehicle-target detection network for SAR images based on the attention mechanism [J]. Journal of Xidian University, 2023, 50(1): 36-47.
[12]	LIU Xiaowen, GUO Jichang, ZHENG Sida. Weakly-supervised salient object detection with the multi-scale progressive network [J]. Journal of Xidian University, 2023, 50(1): 48-57.
[13]	ZHANG Zehuan, LIU Qiang, GUO Difei. High efficient framework for large-scale zero-shot image recognition [J]. Journal of Xidian University, 2022, 49(6): 103-110.
[14]	LI Jiaojiao, LIU Zhiqiang, SONG Rui, LI Yunsong. Algorithm for segmentation of remote sensing imagery using the improved Unet [J]. Journal of Xidian University, 2022, 49(6): 67-75.
[15]	ZHANG Zhaoyu,TIAN Chunna,ZHOU Heng,TIAN Xilan. Online classification jointed RGBT tracking based on the dual attention Siamese network [J]. Journal of Xidian University, 2022, 49(6): 76-85.