结合自注意力与卷积的真实场景图像篡改定位

doi:10.19665/j.issn1001-2400.20230213

摘要/Abstract

摘要：

图像是移动互联网时代传播信息的重要载体,恶意图像篡改是潜在的网络安全威胁之一。与自然场景中在物体尺度上的图像篡改不同,真实场景中的图像篡改存在于伪造的资质证书、文案、屏幕截图等,这些篡改图像通常会经过精心的手工篡改干预,因此其篡改特征与自然场景篡改特征存在差异,更具有多样性,对其篡改区域的定位更具有挑战性。针对该场景复杂且多样的篡改特征,丰富的关系信息是重要的,文中通过卷积神经网络进行自适应特征提取,并利用逆向连接的全自注意力模块进行多阶段特征关注,最后融合多阶段注意力关注结果进行篡改区域定位。所提方法在真实场景图像篡改定位任务中取得了优于对比方法的性能,其中F₁指标比主流方法MVSS-Net高出约8.98%,AUC指标高出约3.58%。此外,所提方法在自然场景图像篡改定位任务中也达到了主流方法的性能,并提供了自然场景篡改特征与真实场景篡改特征存在差异的佐证。在两种场景中的实验结果表明,所提方法能够有效地定位出篡改图像的篡改区域,且在复杂的真实场景中的定位效果更显著。

关键词: 图像篡改定位, 伪造检测, 数字图像取证, 计算机视觉, 自注意力机制, 卷积神经网络

Abstract:

Image is an important carrier of information dissemination in the era of the mobile Internet,making malicious image tampering one of the potential cybersecurity threats.Different from the image tampering on the object scale in the natural scene,image tampering in the real world exists in forged qualification certificates,forged documentation,forged screenshots,etc.The tampered images in the real world usually involve elaborate manual tampering interventions,so their tampering features are different from those in the natural scene and are more diverse,making the localization of tampered areas in the real world more challenging.Rich dependency information is important in considering the complex and diverse tampering features in the real world.Therefore,in this paper,the convolutional neural network is used for adaptive feature extraction and the reversely connected fully self-attention module is adopted for multi-stage feature attention.Finally,the tamper area is located by merging the multi-stage attentional results.The proposed method outperforms the comparison methods in the real world image tampering localization task with the F1 metric 8.98% higher than that of the mainstream method MVSS-Net and the AUC metric 3.58% higher.Besides,the proposed method also achieves the performance of mainstream methods in the natural scene image tampering localization task,and the evidence that the natural scene tampering features are inconsistent with the real world tampering features is provided.Experimental results in two scenes show that the proposed method can effectively locate the tampered area of the tampered images,and that it is more effective in complicated real world.

Key words: image tampering localization, fake detection, digital image forensics, computer vision, self-attention mechanism, convolutional neural networks

中图分类号:

TP391.41

钟浩, 边山, 王春桃. 结合自注意力与卷积的真实场景图像篡改定位[J]. 西安电子科技大学学报, 2024, 51(1): 135-146.

ZHONG Hao, BIAN Shan, WANG Chuntao. Real world image tampering localization combining the self-attention mechanism and convolutional neural networks[J]. Journal of Xidian University, 2024, 51(1): 135-146.

图/表 9

图1

图2

图3

图4

图5

表1

表2

表3

图6

参考文献 40

[1]	DONG J, WANG W, TAN T. CASIA Image Tampering Detection Evaluation Database[C]//Proceedings of the 2013 IEEE China Summit and International Conference on Signal and Information Processing. Piscataway: IEEE, 2013:422-426.
[2]	GUAN H, KOZAK M, ROBERTSON E, et al. MFC Datasets:Large-Scale Benchmark Datasets for Media Forensic Challenge Evaluation[C]//Proceedings of the 2019 IEEE Winter Applications of Computer Vision Workshops. Piscataway: IEEE, 2019:63-72.
[3]	HSU Y, CHANG S. Detecting Image Splicing Using Geometry Invariants and Camera Characteristics Consistency[C]//Proceedings of the 2006 IEEE International Conference on Multimedia and Expo. Piscataway: IEEE, 2006:549-552.
[4]	WEN B, ZHU Y, SUBRAMANIAN R, et al. COVERAGE-A Novel Data base for Copy-Move Forgery Detection[C]//Proceedings of the 2016 IEEE International Conference on Image Processing. Piscataway: IEEE, 2016:161-165.
[5]	FRIDRICH J, SOUKAL D, LUKÁŠ J. Detection of Copy-Move Forgery in Digital Images[C]//Proceedings of the Digital Forensic Research Workshop. Cleveland: Digital Forensic Research, 2003:1-10.
[6]	POPESCU A C, FARID H. Exposing Digital Forgeries by Detecting Duplicated Image Regions[R]. USA: Dartmouth College, 2004.
[7]	ZIMBA M, SUN X M. DWT-PCA(EVD) Based Copy-Move Image Forgery Detection[J]. International Journal of Digital Content Technology and its Applications, 2011, 5(1):251-258. doi: 10.4156/jdcta
[8]	HUANG H, GUO W, ZHANG Y. Detection of Copy-Move Forgery in Digital Images Using SIFT Algorithm[C]//Proceedings of the 2008 IEEE Pacific-Asia Workshop on Computational Intelligence and Industrial Application. Piscataway: IEEE, 2008:272-276.
[9]	PAN X, LYU S. Region Duplication Detection Using Image Feature Matching[J]. IEEE Transactions on Information Forensics and Security, 2010, 5(4):857-867. doi: 10.1109/TIFS.2010.2078506
[10]	FERRARA P, BIANCHI T, DE ROSA A, et al. Image Forgery Localization via Fine-Grained Analysis of CFA Artifacts[J]. IEEE Transactions on Information Forensics and Security, 2012, 7(5):1566-1577. doi: 10.1109/TIFS.2012.2202227
[11]	FARID H. Exposing Digital Forgeries from JPEG Ghosts[J]. IEEE Transactions on Information Forensics and Security, 2009, 4(1):154-160. doi: 10.1109/TIFS.2008.2012215
[12]	LI H, LUO W, HUANG J. Localization of Diffusion-Based Inpainting in Digital Images[J]. IEEE Transactions on Information Forensics and Security, 2017, 12(12):3050-3064. doi: 10.1109/TIFS.2017.2730822
[13]	RAO Y, NI J. A Deep Learning Approach to Detection of Splicing and Copy-Move Forgeries in Images [C]//Proceedings of the 2016 IEEE International Workshop on Information Forensics and Security. Piscataway: IEEE, 2016:1-6.
[14]	LI H, HUANG J. Localization of Deep Inpainting Using High-Pass Fully Convolutional Network[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019:8300-8309.
[15]	BAYAR B, STAMM M C. Constrained Convolutional Neural Networks:A New Approach Towards General Purpose Image Manipulation Detection[J]. IEEE Transactions on Information Forensics and Security, 2018, 13(11):2691-2706. doi: 10.1109/TIFS.2018.2825953
[16]	SALLOUM R, REN Y, KUO C C J. Image Splicing Localization Using A Multi-Task Fully Convolutional Network(MFCN)[J]. Journal of Visual Communication and Image Representation, 2018, 51:201-209. doi: 10.1016/j.jvcir.2018.01.010
[17]	ZHOU P, HAN X, MORARIU V I, et al. Learning Rich Features for Image Manipulation Detection[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018:1053-1061.
[18]	XU W, LUO J, ZHU C, et al. Document Images Forgery Localization Using a Two-Stream Network[J]. International Journal of Intelligent Systems, 2022, 37:5272-5289. doi: 10.1002/int.v37.8
[19]	WU Y, ABDALMAGEED W, NATARAJAN P. ManTra-Net:Manipulation Tracing Network for Detection and Localization of Image Forgeries with Anomalous Features[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019:9535-9544.
[20]	DONG C, CHEN X, HU R, et al. MVSS-Net:Multi-View Multi-Scale Supervised Networks for Image Manipulation Detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(3):3539-3553.
[21]	HE K, ZHANG X, REN S, et al. Deep Residual Learning for Image Recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016:770-778.
[22]	BI X, WEI Y, XIAO B, et al. RRU-Net:The Ringed Residual U-Net for Image Splicing Forgery Detection[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway: IEEE, 2019:30-39.
[23]	HAO J, ZHANG Z, YANG S, et al. TransForensics:Image Forgery Localization with Dense Self-Attention[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021:15035-15044.
[24]	ZHUANG P, LI H, TAN S, et al. Image Tampering Localization Using A Dense Fully Convolutional Network[J]. IEEE Transactions on Information Forensics and Security, 2021, 16:2986-2999. doi: 10.1109/TIFS.2021.3070444
[25]	HU X, ZHANG Z, JIANG Z, et al. SPAN:Spatial Pyramid Attention Network for Image Manipulation Localization[C]//Proceedings of the European Conference on Computer Vision. Piscataway: IEEE, 2020:312-328.
[26]	WU H, ZHOU J, TIAN J, et al. Robust Image Forgery Detection against Transmission over Online Social Networks[J]. IEEE Transactions on Information Forensics and Security, 2022, 17:443-456. doi: 10.1109/TIFS.2022.3144878
[27]	LIU Z, MAO H, WU C Y, et al. A Convnet for the 2020s[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022:11976-11986.
[28]	KWON M J, YU I J, NAM S H, et al. CAT-Net:Compression Artifact Tracing Network for Detection and Localization of Image Splicing[C]//Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision. Piscataway: IEEE, 2021:375-384.
[29]	BOROUMAND M, CHEN M, FRIDRICH J. Deep Residual Network for Steganalysis of Digital Images[J]. IEEE Transactions on Information Forensics and Security, 2019, 14(5):1181-1193. doi: 10.1109/TIFS.2018.2871749
[30]	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An Image is Worth 16x16 Words:Transformers for Image Recognition at Scale(2020)[C/OL].[2020-10-22].https://arxiv.org/abs/2010.11929.
[31]	WANG W, XIE E, LI X, et al. Pyramid Vision Transformer:A Versatile Backbone for Dense Prediction without Convolutions[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021:548-558.
[32]	HOWARD A G, ZHU M, CHEN B, et al. MobileNets:Efficient Convolutional Neural Networks for Mobile Vision Applications(2017)[J/OL].[2017-04-17].https://arxiv.org/abs/1704.04861.
[33]	HU J, SHEN L, ALBANIE S, et al. Squeeze-and-Excitation Networks[J]. IEEE Transactions on Information Forensics and Security, 2020, 42(8):2011-2023.
[34]	刘迪, 郭继昌, 汪昱东, 等. 融合注意力机制的多尺度显著性目标检测网络[J]. 西安电子科技大学学报, 2022, 49(4):118-126.
	LIU Di, GUO Jichang, WANG Yudong, et al. Multi-Scale Salient Object Detection Network Combining an Attention Mechanism[J]. Journal of Xidian University, 2022, 49(4):118-126.
[35]	XIE E, WANG W, YU Z, et al. SegFormer:Simple and Efficient Design for Semantic Segmentation with Transformers[J]. Advances in Neural Information Processing Systems, 2021, 34:12077-12090.
[36]	MILLETARI F, NAVAB N, AHMADI S A. V-Net:Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation[C]//Proceedings of the 2016 Fourth International Conference on 3D Vision. Piscataway: IEEE, 2016:565-571.
[37]	ALIBABA SECURITY, CHINA SOCIETY OF IMAGE AND GRAPHICS(CSIG). Real-World Image Forgery Localization Challenge(2022)[DB/OL].[2022-11-01].https://tianchi.aliyun.com/competition/entrance/531945/introduction.
[38]	LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature Pyramid Networks for Object Detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017:936-944.
[39]	ZOPH B, GHIASI G, LIN T Y, et al. Rethinking Pre-Training and Self-Training[J]. Advances in Neural Information Processing Systems, 2020, 33:3833-3845.
[40]	ZHANG J, SANG J, YI Q, et al. ImageNet Pre-Training also Transfers Non-Robustness[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI, 2023:3436-3444.

消融设置	结构	网络
消融设置	结构	1	2	3	4	5	6
自注意力模块	全自注意力模块	√		√	√	√	√
	原生自注意力模块		√
自注意力模块连接方式	逆向连接	√	√			√	√
	正向连接			√
	无连接				√
解码模块	阶段融合模块	√	√	√	√		√
	特征金字塔					√
损失函数	Log正则化	√	√	√	√	√
	无Log正则化						√
评价指标/%	F₁	63.06	62.08	62.40	62.21	59.68	61.36
	IoU	51.49	50.38	50.48	50.36	48.04	49.66
	MCC	62.48	61.51	61.85	61.52	59.17	60.86
	AUC	90.21	88.05	89.60	89.93	87.87	88.73

方法	数据集
方法	NIST^[2]	COLUMBIA^[3]	CASIA V1.0^[1]	COVERAGE^[4]
文中方法++	32.7	75.0	44.1	45.2
MVSS-Net++	30.4	66.0	51.3	48.2
ManTra-Net	0.0	36.4	15.5	28.6

方法	评价指标
方法	F₁	IoU	MCC	AUC
文中方法	63.06	51.49	62.48	90.21
文中方法*	57.87	46.03	57.44	86.81
MVSS-Net	54.08	42.11	53.31	86.63
MVSS-Net*	53.45	41.30	52.74	84.85
RRU-Net	45.92	34.42	45.08	75.44
ManTra-Net	28.09	18.57	26.50	76.84