增强型深度对抗样本攻击防御算法

doi:10.19665/j.issn1001-2400.2021.06.004

摘要/Abstract

摘要：

当前,深度学习在各种应用领域已取得了巨大的成功,但深度神经网络模型的鲁棒性和性能极易受到带有细微扰动的对抗样本的攻击。针对现有对抗样本去噪防御算法破坏干净样本的有用信息致使模型分类精度下降的缺陷,基于在目标模型上添加增强型输入去噪器,以及基于凸包理论所提出的隐层干净样本有损信息恢复器,提出了一种新的增强型对抗样本攻击防御算法。该算法首先在模型的输入层训练一个去噪器,去噪器的输入为干净样本和对抗样本的并集,期望去噪器去除对抗扰动的同时避免对干净样本的遗忘。其次,考虑到去噪器会破坏干净样本含有的扰动信息,故而在模型的隐层中训练一个恢复器,恢复器的输入为干净样本和对抗样本隐向量的凸组合,期望恢复器将位于错误分类空间的样本重新映射回正确分类空间,以此训练出更具鲁棒性的模型。在多个标准数据集上的大量对比仿真实验表明:所提出的去噪器和恢复器能有效地提升模型的鲁棒性,其对抗样本防御性能优于众多现有代表性的对抗样本防御算法。

关键词: 深度学习, 对抗样本, 输入去噪器, 隐层信息恢复器

Abstract:

Although deep learning has achieved great success in various applications,the deep neural networks (DNNs) are vulnerable to the attack of adversarial samples with imperceptive perturbation information,which makes the robustness and performance of DNNs decrease greatly.To overcome the weakness of the existing denoising algorithms against adversarial samples,which destroys the information on clean samples,leading to reduction in CNN sclassification accuracy,this paper presents a novel enhanced denoising algorithm ID+HIR(Input Denoising andHidden Information Restoring)for adversarial samples.Our ID+HIR is made up of an enhanced input denoising and hidden lossy information restoring based on the theory of convex hull.The algorithm first trains a denoiser on the input layer of the model,with the input of the denoiser being the concatenation of clean and adversarial samples,and the denoiser is expected to remove the adversarial perturbations while avoiding the forgetting of clean samples.Since the denoiser destroys the perturbation information contained in the clean samples,a restorer is trained in the hidden layer of the model,with the input of the restorer being a convex combination of the hidden vectors of the clean and adversarial samples,expecting the restorer to remap the samples located in the incorrect classification space back to the correct classification space,thus training a more robust model.Extensive comparative simulation experiments on several standard datasets show that the denoiser and the recoverer proposed in this paper can effectively improve the robustness of the model,and extensive experiments on benchmark datasets show that our proposed algorithm ID+HIR is superior to the competitive baselines.

Key words: deep learning, adversarial samples, input denoising, hidden information restoring

中图分类号:

TP183

刘佳玮,张文辉,寇晓丽,李雁妮. 增强型深度对抗样本攻击防御算法[J]. 西安电子科技大学学报, 2021, 48(6): 23-31.

LIU Jiawei,ZHANG Wenhui,KOU Xiaoli,LI Yanni. Harnessing adversarial examples via input denoising and hidden information restoring[J]. Journal of Xidian University, 2021, 48(6): 23-31.

图/表 6

图1

图2

图3

表1

表2

表3

参考文献 29

[1]	JOSHI A, MUKHERJEE A, SARKAR S, et al. Semantic Adversarial Attacks:Parametric Transformations That Fool Deep Classifiers[C]// Proceeding of the IEEE/CVF International Conference on Computer Vision (ICCV).Piscataway:IEEE, 2019:4773-4783.
[2]	JIA R, LIANGP. Adversarial Examples for Evaluating Reading Comprehension Systems[C]// Proceeding of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP).Stroudsburg:ACL, 2017:2021-2031.
[3]	SZEGEDY C, ZAREMBA W, SUTSKEVER I, et al. Intriguing Properties of Neural Networks[C]// Proceeding of the 2nd International Conference on Learning Representations (ICLR).La Jolla:ICLR, 2014:1-10.
[4]	MADRY A, MAKELOV A, SCHMIDTL, et al. Towards Deep Learning Models Resistant to Adversarial Attacks[C]// Proceeding of the 6th International Conference on Learning Representations (ICLR).La Jolla:ICLR, 2018:1-28.
[5]	SHAFAHI A, HUANG W R, STUDER C, et al. Are Adversarial Examples Inevitable[C]// Proceeding of the 7th International Conference on Learning Representations (ICLR).La Jolla:ICLR, 2019:1-17.
[6]	SHAFAHI A, NAJIBI M, GHIASI A, et al. Adversarial Training for Free![C]// Proceedings of the 33rd International Conference on Neural Information Processing Systems (NIPS).New York:ACM, 2019:3358-3369.
[7]	WANG Y, ZOU D, YI J, et al. Improving Adversarial Robustness Requires Revisiting Misclassified Examples[C]// Proceeding of the 7th International Conference on Learning Representations (ICLR).La Jolla:ICLR, 2019:1-14.
[8]	ZHENG H, ZHANG Z, GU J, et al. Efficient Adversarial Training With Transferable Adversarial Examples[C]// Proceeding of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Piscataway:IEEE, 2020:1181-1190.
[9]	DING G W, SHARMA Y, LUIK Y C, et al. MMA Training:Direct Input Space Margin Maximization through Adversarial Training[C]// Proceeding of the 7th International Conference on Learning Representations (ICLR).La Jolla:ICLR, 2019:1-28.
[10]	JIA X, WEI X, CAO X, et al. ComDefend:An Efficient Image Compression Model to Defend Adversarial Examples[C]// Proceeding of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Piscataway:IEEE, 2019:6084-6092.
[11]	LIAO F, LIANG M, DONG Y, et al. Defense Against Adversarial Attacks Using High-Level Representation Guided Denoiser[C]// Proceeding of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Piscataway:IEEE, 2018:1778-1787.
[12]	SONG Y, KIM T, NOWOZIN S, et al. PixelDefend:Leveraging Generative Models to Understand and Defend against Adversarial Examples[C]// Proceeding of the 6th International Conference on Learning Representations (ICLR).La Jolla:ICLR, 2018:1-20.
[13]	GU S, RIGAZIO L. Towards Deep Neural Network Architectures Robust to Adversarial Examples[C]// In Proceeding of the 3th International Conference on Learning Representations (ICLR).La Jolla:ICLR, 2015:1-9.
[14]	XU W, DAVID E, QI Y. Feature Squeezing:Detecting Adversarial Examples in Deep Neural Networks[C]// In Proceeding of the 25th Annual Network and Distributed System Security Symposium(NDSS) 2018.
[15]	GUO C, RANA M, CISSE M, et al. Countering Adversarial Images using Input Transformations[C]// Proceeding of the 6th International Conference on Learning Representations (ICLR).La Jolla:ICLR, 2018:1-12.
[16]	NASEER M, KHAN S, HAYATM, et al. A Self-Supervised Approach for Adversarial Robustness[C]// Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Piscataway:IEEE, 2020:262-271.
[17]	张树栋, 高海昌, 曹曦文, 等. 针对ASR系统的快速有目标自适应对抗攻击[J]. 西安电子科技大学学报, 2021, 48(1):168-175.
	ZHANG Shudong, GAO Haichang, CAO Xiwen, et al. Adaptive Fast and Targeted Adversarial Attack for Speech Recognition[J]. Journal of Xidian University, 2021, 48(1):168-175.
[18]	XIE C, WU Y, MAATEN L, et al. Feature Denoising for Improving Adversarial Robustness[C]// Proceeding of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Piscataway:IEEE, 2019:501-509.
[19]	SALMAN H, SUN M, YANG G, et al. Denoised Smoothing:A Provable Defense for Pretrained Classifiers[C]// Proceeding of 33th Neural Information Processing Systems (NIPS).New York:ACM, 2020:21945-21957.
[20]	JEONG, J, SHIN J. Consistency Regularization for Certified Robustness of Smoothed Classifiers[C]// // Proceeding of 33th Neural Information Processing Systems (NIPS).New York:ACM, 2020:6-12.
[21]	RONNEBERGER O, FISCHER P, BROX T. U-net:Convolutional Networks for Biomedical Image Segmentation[C]// In Proceeding of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI).Heidelberg:Springer, 2015:234-241.
[22]	ZHANG H, CISSE M, YANN N, et al. Mixup:Beyond Empirical Risk Minimization[C]// Proceeding of the 6th International Conference on Learning Representations (ICLR).La Jolla:ICLR, 2018:1-13.
[23]	陈开周. 最优化计算方法[M]. 西安: 西北电讯工程学院出版社, 1985: 22.
[24]	HE K, ZHANG X, REN S, et al. Deep Residual Learning for Image Recognition[C]// Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Piscataway:IEEE, 2016:770-778.
[25]	GOODFELLOW I J, SHLENS J, SZEGEDY C. Explaining and Harnessing Adversarial Examples[C]// Proceeding of the 3rd International Conference on Learning Representations (ICLR).La Jolla:ICLR, 2015:1-11.
[26]	CARLINI N, WAGNER D. Towards Evaluating the Robustness of Neural Networks[C]// Proceeding of the 2017 IEEE Symposium on Security and Privacy (SP).Piscataway:IEEE, 2017:39-57.
[27]	KOS J, FISCHER I, SONG D. Adversarial Examples for Generative Models[C]// Proceeding of the 2018 IEEE Security and Privacy Workshops (SPW).Piscataway:IEEE, 2018:36-42.
[28]	LIUS, DENG W. Very Deep Convolutional Neural Network Based Image Classification Using Small Training Sample Size[C]// Proceeding of the 3rd IAPR Asian Conference on Pattern Recognition (ACPR).Piscataway:IEEE, 2015:730-734.
[29]	WONG E, RICE L, KOLTER J Z. Fast is Better Than Free:Revisiting Adversarial Training[C]// Proceeding of the 8th International Conference on Learning Representations (ICLR).La Jolla:ICLR, 2020:1-17.

防御算法	样本
	干净样本		FGSM(ε=4/8/16)		CW(ε=4/8/16)		BIM(ε=4/8/16)
	VGG16	ResNet18	VGG16	ResNet18	VGG16	ResNet18	VGG16	ResNet18
无防御	92.6	93.2	80.3/64.6/46.1	82.8/67.6/45.2	83.5/77.0/69.3	84.5/76.2/70.3	82.7/57.1/30.9	83.2/55.1/31.5
NRP^[16](2020)	79.4	78.7	76.7/74.9/72.2	74.8/72.5/68.1	78.5/78.5/77.5	77.1/76.7/75.8	77.4/74.8/71.4	75.6/72.5/68.3
Fast AT^[29](2020)	81.5	83.8	88.1/78.7/56.0	89.0/79.1/55.0	91.8/90.8/87.8	91.2/90.7/87.4	90.1/90.8/87.1	91.0/90.7/87.5
ComDefend^[10](2019)	85.2	87.1	82.7/78.8/70.8	82.2/77.3/69.9	84.5/84.5/82.6	84.6/84.5/81.9	83.2/80.3/76.6	84.0/79.8/74.9
FS^[14](2018)	91.0	92.0	82.5/65.3/48.8	83.4/66.7/49.1	89.5/88.1/83.3	89.7/88.9/82.7	85.2/62.2/41.6	86.3/66.9/43.9
JPEG^[15](q=25) (2018)	71.0	74.1	66.6/63.0/54.3	66.2/64.1/55.0	70.1/69.8/68.2	69.9/69.9/68.5	67.5/64.2/60.2	68.2/63.1/59.4
JPEG^[15](q=50)	80.2	82.3	74.4/68.2/51.0	73.5/69.2/51.8	79.0/77.8/75.1	79.2/77.2/74.4	75.7/70.5/63.3	76.3/71.2/64.3
JPEG^[15](q=75)	85.9	84.9	78.4/68.5/47.5	79.2/69.7/48.1	84.8/83.0/78.5	85.9/84.2/78.8	80.5/72.9/58.8	81.0/73.4/58.9
TVM^[15](2018)	88.8	86.9	81.5/76.7/52.7	80.5/75.2/51.8	87.0/86.2/81.8	87.8/86.6/81.0	83.1/74.0/60.3	82.2/73.8/59.7
ID+HIR(Our)	91.2	92.1	89.7/86.8/73.8	90.0/86.2/75.4	90.8/90.9/90.6	91.2/90.8/90.2	90.4/88.7/82.6	90.2/88.0/83.1

防御算法	样本
	干净样本		FGSM(ε=4/8/16)		CW(ε=4/8/16)		BIM(ε=4/8/16)
	VGG16	ResNet18	VGG16	ResNet18	VGG16	ResNet18	VGG16	ResNet18
无防御	72.6	76.4	52.1/45.3/39.6	51.3/42.4/37.9	63.2/61.4/54.8	62.0/59.3/51.3	53.4/50.1/29.3	52.6/48.4/28.2
NRP^[16](2020)	65.2	65.4	65.4/62.1/61.2	69.2/68.5/68.3	68.1/68.7/66.3	71.8/70.9/70.3	69.3/67.7/59.3	72.1/70.7/66.4
Fast AT^[28](2020)	65.8	70.1	70.2/62.3/55.1	74.1/70.2/65.2	72.1/70.9/69.2	74.2/74.1/73.6	72.0/70.3/68.2	75.5/73.7/72.9
ComDefend^[10](2019)	68.7	71.4	66.2/61.4/60.2	70.2/69.8/67.2	70.0/69.7/65.2	73.8/71.2/69.1	68.2/66.3/60.1	73.2/71.7/67.5
FS^[14](2018)	71.3	74.1	67.2/57.1/54.2	54.7/44.5/40.1	70.2/70.1/70.5	70.4/67.2/65.6	68.3/62.5/57.3	71.3/69.5/64.1
JPEG^[15](q=25) (2018)	55.4	55.7	47.2/42.3/39.1	49.2/41.3/39.1	57.2/57.1/56.80.5	56.1/54.1/50.4	57.7/55.4/50.0	58.2/56.2/51.2
JPEG^[15](q=50)	58.8	60.5	49.2/43.7/38.29.1	51.5/44.7/42.7	64.2/63.4/62.7	65.2/62.6/62.1	66.0/61.6/58.2	65.1/62.4/59.7
JPEG^[15](q=75)	68.2	71.8	58.9/47.2/45.2	55.2/47.5/45.4	69.3/68.9/67.2	68.3/67.3/63.5	67.3/58.2/45.5	63.3/59.2/47.5
TVM^[15](2018)	69.1	72.3	61.7/51.2/49.7	57.1/49.3/51.6	70.1/68.2/67.7	69.8/68.2/63.1	67.9/61.3/50.2	64.2/60.8/51.7
ID+HIR (Our)	71.8	74.8	71.2/68.3/63.7	75.5/72.3/68.7	72.2/71.2/70.6	74.9/74.7/74.2	71.5/69.8/64.5	75.5/73.1/69.3.1

防御算法	样本
	FGSM(?=4/8/16)			CW(?=4/8/16)
	CIFAR-10	SVHN	MNIST	CIFAR-10	SVHN	MNIST
无防御	52.7/44.0/34.0	52.1/40.0/27.4	-/-/41.2	17.1/14.2/9.0	37.1/17.2/9.5	-/-/74.3
ID	86.7/90.5/92.2	92.6/95.3/96.1	-/-/98.9	81.8/82.1/73.2	95.8/93.5/94.6	-/-/98.0
ID+HIR	88.3/91.5/92.2	93.5/96.2/96.4	-/-/99.1	85.0/85.2/90.0	96.2/94.8/95.7	-/-/98.7