隐私保护的拜占庭鲁棒联邦学习算法

doi:10.19665/j.issn1001-2400.2023.04.012

摘要/Abstract

摘要：

联邦学习是一种分布式机器学习范式,其中节点的原始训练集不出本地,它们通过共享模型更新来协作训练机器学习模型。当前联邦学习领域中的隐私保护和拜占庭攻击检测研究大都独立展开,现有的拜占庭攻击检测方法不可直接应用于隐私保护环境,不符合联邦学习的实际应用需求。针对上述问题,提出一种可在数据非独立同分布和隐私保护环境下拜占庭鲁棒的联邦学习算法。首先,以差分隐私技术为模型更新(本地模型梯度信息)提供隐私保护;然后,基于节点上传的历史模型更新对节点当前状态进行可信度评估;最后,根据评估结果进行全局模型聚合。仿真实验结果表明,在节点训练集非独立同分布、隐私保护和拜占庭节点比例为20%~80%的联邦学习环境中,所提算法进行拜占庭节点检测的漏检率和误检率均为0%。同时,随着节点数量的增加,拜占庭节点检测的时间开销呈线性增长的趋势。与现有的拜占庭节点检测算法相比,所提算法在节点数据非独立同分布及模型隐私保护情况下可得到更高精度的全局模型。

关键词: 联邦学习, 拜占庭攻击, 异常检测, 隐私保护技术, 差分隐私

Abstract:

Federated learning is a distributed machine learning paradigm,in which the original training sets of the nodes do not have to leave the local area and they collaborate to train machine learning models by sharing model updates.Most of the current privacy-preserving and Byzantine attack detection researches in the field of federated learning are carried out independently,and the existing Byzantine attack detection methods cannot be directly applied to the privacy-preserving environment,which does not meet the practical application requirements of federated learning.To address these problems,this paper proposes a federated learning algorithm for Byzantine robustness in a privacy-preserving environment with data non-independent and identically distributed.First,privacy protection is provided for model updates (local model gradient information) by differential privacy techniques; then the credibility is evaluated for the current state of nodes based on historical model updates uploaded by nodes; and finally,global model aggregation is performed based on the evaluation results.Simulation results show that in a federated learning environment with data non-independent and identically distributed,and with the privacy protection and Byzantine node ratio of 20%~80%,the proposed algorithm performs Byzantine node detection with both the miss detection rate and the false detection rate at 0%.Meanwhile,the time overhead of Byzantine node detection tends to linearly increase with the increase in the number of the nodes.Compared with the existing Byzantine node detection algorithms,the proposed algorithm can obtain a global model with a higher accuracy in the case of data being non-independent and identically distributed and model privacy protection.

Key words: federated learning, Byzantine attack, anomaly detection, privacy-preserving techniques, differential privacy

中图分类号:

TP39

李海洋,郭晶晶,刘玖樽,刘志全. 隐私保护的拜占庭鲁棒联邦学习算法[J]. 西安电子科技大学学报, 2023, 50(4): 121-131.

LI Haiyang,GUO Jingjing,LIU Jiuzun,LIU Zhiquan. Privacy preserving byzantine robust federated learning algorithm[J]. Journal of Xidian University, 2023, 50(4): 121-131.

图/表 12

图1

表1

表2

表3

图2

图3

图4

图5

图6

图7

图8

图9

参考文献 23

[1]	李志鹏, 国雍, 陈耀佛, 等. 基于数据生成的类别均衡联邦学习[J]. 计算机学报, 2023, 46(3):609-625.
	LI Zhipeng, GUO Yong, Chen Yaofo, et al. Class-Balanced Federated Learning Base on Data Generation[J]. Chinese Journal of Computers, 2023, 46(3):609-625.
[2]	李荣昌, 刘涛, 郑海斌, 等. 基于最大最小策略的纵向联邦学习隐私保护方法 (2023)[J/OL].[2023-03-19].https://doi.org/10.16383/j.aas.c211233.
	LI Rongchang, LIU Tao, ZHENG Haibing, et al. Privacy Preserving Method for Vertical Federated Leanig Base on Max-Min Strategy (2023)[J/OL].[2023-03-19].https://doi.org/10.16383/j.aas.c211233.
[3]	KAIROUZ P, MCMAHAN B, AVENT B, et al. Advances and Open Problems in Federated Learning[J]. Foundations and Trends in Machine Learning, 2021, 14(1-2):1-210. doi: 10.1561/2200000083
[4]	顾育豪, 白跃彬. 联邦学习模型安全与隐私研究进展 (2023)[J/OL].[2023-03-19]. http://www.jos.org.cn/1000-9825/6658.htm.
	GU Yuhao, BAI Yuebing. Survey on Security and Privacy of Federated Learning Models (2023)[J/OL].[2023-03-19]. http://www.jos.org.cn/1000-9825/6658.htm.
[5]	刘俊旭, 孟小峰. 机器学习的隐私保护研究综述[J]. 计算机研究与发展, 2020, 57(2):346-362.
	LIU Junxv, MENG Xiaofeng. Survey on Privacy-Preserving Machine Learning[J]. Journal of Computer Research and Development, 2020, 57(2):346-362.
[6]	谭作文, 张连福. 机器学习隐私保护研究综述[J]. 软件学报, 2020, 31(7):2127-2156.
	TAN Zuowen, ZHANG Lianfu. Survey on Privacy Preserving Techniques for Machine Learning[J]. Journal of Software, 2020, 31(7):2127-2156. (in Chinese)
[7]	纪守领, 杜天宇, 李进锋, 等. 机器学习模型安全与隐私研究综述[J]. 软件学报, 2021, 32(1):41-67.
	JI Shouling, DU Tianyu, LI Jinfeng, et al. Security and Privacy of Machine Learning Models:A Survey[J]. Journal of Software, 2021, 32(1):41-67. (in Chinese)
[8]	LIU X, LI H, XU G, et al. Adaptive Privacy-Preserving Federated Learning[J]. Peer-to-Peer Networking and Applications, 2020, 13(6):2356-2366. doi: 10.1007/s12083-019-00869-2
[9]	WEI K, LI J, DING M, et al. User-Level Privacy-Preserving Federated Learning:Analysis and Performance Optimization[J]. IEEE Transactions on Mobile Computing, 2021, 21(9):3388-3401. doi: 10.1109/TMC.2021.3056991
[10]	SHEJWALKAR V, HOUMANSADR A. Manipulating the Byzantine:Optimizing Model Poisoning Attacks and Defenses for Federated Learning[C]// Network and Distributed Systems Security (NDSS) Symposium 2021. San Diego: NDSS, 2021:1-19.
[11]	GU Z, HE L, LI P, et al. FREPD:A Robust Federated Learning Framework on Variational Autoencoder[J]. Computer Systems:Science & Engineering, 2021, 39(3):307-320.
[12]	LI S, CHENG Y, WANG W, et al. Learning to Detect Malicious Clients for Robust Federated Learning (2020)[J/OL].[2020-02-01]. https://arxiv.org/abs/2002.00211v1.
[13]	ZHAO Y, CHEN J, ZHANG J, et al. PDGAN:A Novel Poisoning Defense Method in Federated Learning Using Generative Adversarial Network[C]// Algorithms and Architectures for Parallel Processing:19th International Conference,ICA3PP 2019.Heidelberg:Springer, 2020:595-609.
[14]	ZHAO Y, CHEN J, ZHANG J, et al. Detecting and Mitigating Poisoning Attacks in Federated Learning Using Generative Adversarial Networks[J]. Concurrency and Computation:Practice and Experience, 2022, 34(7):1-12.
[15]	顾兆军, 刘婷婷, 隋翯. 一种ICS异常检测的优化GAN模型[J]. 西安电子科技大学学报, 2022, 49(2):172-181.
	GU Zhaojun, LIU Tingting, SUI He. Latent Feature Reconstruction Generative GAN Model for ICS Anomaly Detection[J]. Journal of Xidian University, 2022, 49(2):172-181.
[16]	CAO X, FANG M, LIU J, et al. FLTrust:Byzantine-Robust Federated Learning via Trust Bootstrapping[C]// Network and Distributed Systems Security (NDSS) Symposium 2021. San Diego: NDSS, 2021:1-18.
[17]	邬开俊, 梅源. VAE-Fuse:一种无监督的多聚焦融合模型[J]. 西安电子科技大学学报, 2022, 49(6):129-138.
	WU Kaijun, MEI Yuan. VAE-Fuse:An Unsupervised Multi-Focus Fusion Model[J]. Journal of Xidian University, 2022, 49(6):129-138.
[18]	陈永, 牛凯玉, 康婕. LSTM循环神经网络的高速铁路越区切换算法[J]. 西安电子科技大学学报, 2023, 50(1):76-84.
	CHENG Yong, NIU Kaiyu, KANG Jie. Handover Algorithm fora High-Speed Railway Based on the LSTM Recurrent Neural Network[J]. Journal of Xidian University, 2023, 50(1):76-84.
[19]	黄茜茜. 基于差分隐私保护的不均衡数据联邦学习方法[D]. 哈尔滨: 哈尔滨工业大学, 2019.
[20]	BLANCHARD P, MAHDI E, GUERRAOUI R, et al. Machine Learning with Adversaries:Byzantine Tolerant Gradient Descent[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017:118-128.
[21]	YIN D, CHEN Y, RAMCHANDRAN K, et al. Byzantine-Robust Distributed Learning:Towards Optimal Statistical Rates (2018)[C/OL].[2018-03-05]. https://arxiv.org/abs/1803.01498.
[22]	SO J, GÜLER B, AVESTIMEHR A S. Byzantine-Resilient Secure Federated Learning[J]. IEEE Journal on Selected Areas in Communications, 2021: 39(7):2168-2181. doi: 10.1109/JSAC.2020.3041404
[23]	DONG Y, CHEN X, LI K, et al. FLOD:Oblivious Defender for Private Byzantine-Robust Federated Learning with Dishonest-Majority[C]// Computer Security-ESORICS 2021:26th European Symposium on Research in Computer Security.Heidelberg:Springer, 2021:497-518.

设备	参数
工作站	DELL T7920
系统	Ubuntu 18.04 LTS
处理器	2*Intel(R)4210R
主频	2.4 GHz
内存	5 GB×32 GB
显卡	GTX 1650
显存	4 GB

符号	含义
u	联邦学习系统节点数量
epsilon	隐私预算
t_m	虚门限
m_n	联邦学习系统中拜占庭节点占比
ad	部署了异常检测机制
nad	未部署异常检测机制
MLP	自定义的多层感知机模型
CNN	自定义的卷积神经网络模型
IID	独立同分布的数据划分方式

分类	特征	方法
N₁	数据量层面非独立同分布,数据标签类别层面独立同分布	假设数据集有n条数据,平均分为p个分片,则每个分片有n/p条数据;给每个节点随机分配r_p,r_p∈[0,p]个分片的数据
N₂	数据量层面独立同分布,数据标签类别层面非独立同分布	假设数据集的标签有c类,每一类标签都分配c_m条数据;给每个节点随机分配r_c,r_c∈[0,c]类的数据
N₃	数据量和数据标签类别层面都非独立同分布	假设数据集的标签有c类,每一类标签设置c_m条数据,再将每一类数据分为c_p个分片,则每一类数据的分片有c_m/c_p条数据,首先给每个节点随机分r_c,r_c∈[0,c]类数据,然后针对每类数据随机分r_p,r_p∈[0,c_p]个分片的数据