西安电子科技大学学报 ›› 2023, Vol. 50 ›› Issue (5): 188-198.doi: 10.19665/j.issn1001-2400.20230601

• 网络空间安全 • 上一篇    下一篇

面向数据质量的隐私保护多分类LR方案

曹来成(),吴文涛(),冯涛(),郭显()   

  1. 兰州理工大学 计算机与通信学院,甘肃 兰州 730050
  • 收稿日期:2023-01-15 出版日期:2023-10-20 发布日期:2023-11-21
  • 作者简介:曹来成(1965—),男,教授,E-mail:caolch@lut.edu.cn;|吴文涛(1996—),男,兰州理工大学硕士研究生,E-mail:1951557832@qq.com;|冯 涛(1970—),男,教授,E-mail:fengt@lut.edu.cn;|郭 显(1971—),男,教授,E-mail:iamxg@163.com
  • 基金资助:
    国家自然科学基金(61562059);国家自然科学基金(61461027);甘肃省自然科学基金(20JR5RA467)

Privacy preserving multi-classification LR scheme for data quality

CAO Laicheng(),WU Wentao(),FENG Tao(),GUO Xian()   

  1. School of Computer and Communication,Lanzhou University of Technology,Lanzhou 730050,China
  • Received:2023-01-15 Online:2023-10-20 Published:2023-11-21

摘要:

为了保护机器学习中多分类逻辑回归模型的隐私,保证训练数据质量并减少计算和通信开销,提出了一种面向数据质量的隐私保护多分类逻辑回归方案。首先,基于近似数算术同态加密技术,利用批处理技术和单指令多数据机制将多条消息打包成一个密文,安全地将加密的向量移位成明文向量对应的密文。其次,采用“一对其余”的拆解策略,通过训练多个分类器,将二分类逻辑回归模型推广到多分类。最后,将训练数据集划分为多个固定大小的矩阵,这些矩阵仍然保留完整的样本信息数据结构;用固定的海森方法优化模型参数,使其适用于任何情况并保证参数隐私。在模型训练期间,该方案能够减轻数据的稀疏性,并保证数据质量。安全性分析显示,整个过程中能够保证训练模型和用户数据信息都不被泄漏,同时实验表明,该方案的训练准确率比现有方案有了较大提升,与未加密数据训练得到的准确率几乎相同,且该方案具有更低的计算开销。

关键词: 同态加密, 云计算, 逻辑回归, 隐私保护, 数据质量

Abstract:

In order to protect the privacy of the multi-classification logistic regression model in machine learning,ensure the quality of training data,and reduce the computing and communication costs,a privacy preserving multi-classification logistic regressions cheme for data quality is proposed.First,based on the homomorphic encryption for arithmetic of approximate numbers technology,the batch processing technology and single-instruction multi-data mechanism are used to package multiple messages into one ciphertext,and the encrypted vector is safely shifted into the ciphertext corresponding to the plaintext vector.Second,the binary logistic regression model is extended to multiple classifications by training multiple classifiers using the "One vs Rest" disassembly strategy.Finally,the training data set is divided into several matrices of a fixed size,which still retain the complete data structure of the sample information.The fixed Hessian method is used to optimize the model parameters so that they can be used in any case and keep the parameters private.during model training.The scheme can reduce data sparsity and ensure data quality.The security analysis shows that the training model and user data information cannot be leaked in the whole process.Meanwhile,the experiment shows that the training accuracy of this scheme is greatly improved compared with the existing scheme and almost the same as that obtained by training unencrypted data,and that the scheme has a lower computing cost.

Key words: homomorphic encryption, cloud computing, logical regression, privacy-preserving, data quality

中图分类号: 

  • TP309.2
Baidu
map