面向数据质量的隐私保护多分类LR方案

doi:10.19665/j.issn1001-2400.20230601

摘要/Abstract

摘要：

为了保护机器学习中多分类逻辑回归模型的隐私,保证训练数据质量并减少计算和通信开销,提出了一种面向数据质量的隐私保护多分类逻辑回归方案。首先,基于近似数算术同态加密技术,利用批处理技术和单指令多数据机制将多条消息打包成一个密文,安全地将加密的向量移位成明文向量对应的密文。其次,采用“一对其余”的拆解策略,通过训练多个分类器,将二分类逻辑回归模型推广到多分类。最后,将训练数据集划分为多个固定大小的矩阵,这些矩阵仍然保留完整的样本信息数据结构;用固定的海森方法优化模型参数,使其适用于任何情况并保证参数隐私。在模型训练期间,该方案能够减轻数据的稀疏性,并保证数据质量。安全性分析显示,整个过程中能够保证训练模型和用户数据信息都不被泄漏,同时实验表明,该方案的训练准确率比现有方案有了较大提升,与未加密数据训练得到的准确率几乎相同,且该方案具有更低的计算开销。

关键词: 同态加密, 云计算, 逻辑回归, 隐私保护, 数据质量

Abstract:

In order to protect the privacy of the multi-classification logistic regression model in machine learning,ensure the quality of training data,and reduce the computing and communication costs,a privacy preserving multi-classification logistic regressions cheme for data quality is proposed.First,based on the homomorphic encryption for arithmetic of approximate numbers technology,the batch processing technology and single-instruction multi-data mechanism are used to package multiple messages into one ciphertext,and the encrypted vector is safely shifted into the ciphertext corresponding to the plaintext vector.Second,the binary logistic regression model is extended to multiple classifications by training multiple classifiers using the "One vs Rest" disassembly strategy.Finally,the training data set is divided into several matrices of a fixed size,which still retain the complete data structure of the sample information.The fixed Hessian method is used to optimize the model parameters so that they can be used in any case and keep the parameters private.during model training.The scheme can reduce data sparsity and ensure data quality.The security analysis shows that the training model and user data information cannot be leaked in the whole process.Meanwhile,the experiment shows that the training accuracy of this scheme is greatly improved compared with the existing scheme and almost the same as that obtained by training unencrypted data,and that the scheme has a lower computing cost.

Key words: homomorphic encryption, cloud computing, logical regression, privacy-preserving, data quality

中图分类号:

TP309.2

曹来成,吴文涛,冯涛,郭显. 面向数据质量的隐私保护多分类LR方案[J]. 西安电子科技大学学报, 2023, 50(5): 188-198.

CAO Laicheng,WU Wentao,FENG Tao,GUO Xian. Privacy preserving multi-classification LR scheme for data quality[J]. Journal of Xidian University, 2023, 50(5): 188-198.

图/表 7

图1

表1

图1

图2

图3

图4

表2

参考文献 15

[1]	XU W, WANG B, LIU J, et al. Toward Practical Privacy-Preserving Linear Regression[J]. Information Sciences, 2022, 596:119-136. doi: 10.1016/j.ins.2022.03.023
[2]	CHEN Y, HUANG R, YANG B. Efficient Batch Fully Homomorphic Encryption with a Shorter Key from Ring-LWE[J]. Applied Sciences, 2022, 12(17):8420. doi: 10.3390/app12178420
[3]	AHARONI E, DRUCKER N, EZOV G, et al. Complex Encoded Tile Tensors:Accelerating Encrypted Analytics[J]. IEEE Security & Privacy, 2022, 20(5):35-43.
[4]	DENG W, PENG Y, YANG F, et al. Feature Optimization and Hybrid Classification for Malicious Web Page Detection[J]. Concurrency and Computation:Practice and Experience, 2022, 34(16):e5859. doi: 10.1002/cpe.v34.16
[5]	SINHA S, SAHA S, ALAM M, et al. Exploring Bitslicing Architectures for Enabling FHE-Assisted Machine Learning[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2022, 41(11):4004-4015. doi: 10.1109/TCAD.2022.3204909
[6]	JANG J, LEE Y, KIM A, et al. Privacy-Preserving Deep Sequential Model with Matrix Homomorphic Encryption[C]//Proceedings of the 2022 ACM on Asia Conference on Computer and Communications Security. New York: ACM, 2022:377-391.
[7]	YAN J, CAO J. Privacy Preservation of Optimization Algorithm over Unbalanced Directed Graph[J]. IEEE Transactions on Network Science and Engineering, 2022, 9(4):2164-2173. doi: 10.1109/TNSE.2022.3155481
[8]	JIA H, ALDEEN M S, ZHAO C, et al. Flexible Privacy-Preserving Machine Learning:When Searchable Encryption Meets Homomorphic Encryption[J]. International Journal of Intelligent Systems, 2022, 37(11):9173-9191. doi: 10.1002/int.v37.11
[9]	FU F, LIU S, CHENG Y. Vertical Federated Logistic Regression via Homomorphic Encryption and Secret Sharing[J]. Information and Communications Technology and Policy, 2022, 48(5):34-44.
[10]	ZHAO J, ZHU H, WANG F, et al. ACCEL:An Efficient and Privacy-Preserving Federated Logistic Regression Scheme over Vertically Partitioned Data[J]. Science China Information Sciences, 2022, 65(7):1-2.
[11]	EDEMACU K, KIM J W. Multi-Party Privacy-Preserving Logistic Regression with Poor Quality Data Filtering for IoT Contributors[J]. Electronics, 2021, 10(17):2049. doi: 10.3390/electronics10172049
[12]	YANG S, HUANG X. Universal Product Learning with Errors:A New Variant of LWE for Lattice-based Cryptography[J]. Theoretical Computer Science, 2022, 915:90-100. doi: 10.1016/j.tcs.2022.02.032
[13]	SONG D, VOLD A, MADAN K, et al. Multi-Label Legal Document Classification:A Deep Learning-Based Approach with Label-Attention and Domain-Specific Pre-Training[J]. Information Systems, 2022, 106:101718. doi: 10.1016/j.is.2021.101718
[14]	NGUYEN T, KARUNANAYAKE N, WANG S, et al. Privacy-Preserving Spam Filtering Using Homomorphic and Functional Encryption[J]. Computer Communications, 2023, 197:230-241. doi: 10.1016/j.comcom.2022.11.002
[15]	WIESE M, BOCHE H. Mosaics of Combinatorial Designs for Information-Theoretic Security[J]. Designs,Codes and Cryptography, 2022, 90(3):593-632. doi: 10.1007/s10623-021-00994-1

方案	多分类	模型隐私保护	模型训练期间数据质量	近似数算术同态加密
文献[9]	×	×	√	×
文献[10]	×	√	×	×
文献[11]	√	√	√	×
稳私保护多分类逻辑回归	√	√	√	√

数据集	样本数量	特征数量	迭代次数	sigmoid函数	数据状态	加密时间/s	训练时间/s	准确率/%
Heart Disease	298	13	19	原函数	未加密		0.220	93.90
			7	近似函数	未加密		0.318	83.12
			7	近似函数	加密	4.05	756.000	83.10
Edinburgh	1 253	10	19	原函数	未加密		0.389	95.80
			7	近似函数	未加密		0.502	84.29
			7	近似函数	加密	5.12	1 374.000	85.25