混洗差分隐私保护的度分布直方图发布算法

doi:10.19665/j.issn1001-2400.20230207

摘要/Abstract

摘要：

当前,基于中心化或本地差分隐私的图数据度分布直方图发布算法无法有效平衡发布数据的隐私保护程度及其可用性,且不能有效保护用户的身份隐私。针对该问题,在编码-混洗-分析框架下提出一种混洗差分隐私保护的度分布直方图发布算法。首先,设计混洗差分隐私图数据度分布直方图隐私保护框架,采取交互式用户分组、混洗器及方波本地加噪扰动机制降低编码器对分布式用户本地差分隐私加噪的噪声影响,并利用极大似然估计在分析器端对加噪后的度分布直方图进行数据矫正,从而提高数据效用;其次,提出具体的分布式用户分组、混洗差分隐私加噪和数据矫正算法,并证明其满足(ε,σ)-混洗差分隐私。实验和对比结果表明,所提算法能保护分布式用户隐私,在L₁距离、H距离和MSE多个指标度量下的数据效用比已有算法提升了26%以上,且具有较低的时间开销和稳定的数据效用表现,适用不同规模的图数据度分布直方图发布共享应用。

关键词: 隐私保护技术, 图结构, 混洗差分隐私, 度分布直方图发布, 数据效用

Abstract:

At present,the existing histogram publishing algorithms based on centralized or local differential privacy for graph data degree distribution can neither balance the privacy and utility of published data,nor preserve the identity privacy of end users.To solve this problem,a histogram publishing algorithm for degree distribution via shuffled differential privacy(SDP) is proposed under the framework of Encode-Shuffle-Analyze.First,a privacy preserving framework for histogram publishing of degree distribution is designed based on shuffled differential privacy.In this framework,the noisy impact that the encoder brings to distributed users is reduced by employing interactive user grouping,the shuffler and the square wave noise mechanism,while adding noise via local differential privacy.The noisy histogram of degree distribution is reconciled via the maximum likelihood estimation at the analyzer end,thus improving the utility of published data.Second,specific algorithms are proposed for concreting distributed user grouping,adding shuffled differential privacy noise and reconciling the noisy data,respectively.Furthermore,it is proved that these algorithms meet the requirement of(ε,σ)-SDP.Experiments and comparisons illustrate that the proposed algorithms can preserve the privacy of distributed users,and that the data utility is improved more than 26% with metrics in terms of L₁ distance,H distance and MSE in comparison with the existing related algorithms.The proposed algorithms also perform with a low overhead and stable data utility,and are suitable for publishing and sharing the histogram of degree distribution for different scales of graph data.

Key words: privacy-preserving techniques, graph structures, shuffled differential privacy, degree distribution histogram publishing, data utility

中图分类号:

TN918

丁红发,傅培旺,彭长根,龙士工,吴宁博. 混洗差分隐私保护的度分布直方图发布算法[J]. 西安电子科技大学学报, 2023, 50(6): 219-236.

DING Hongfa,FU Peiwang,PENG Changgen,LONG Shigong,WU Ningbo. Histogram publishing algorithm for degree distribution via shuffled differential privacy[J]. Journal of Xidian University, 2023, 50(6): 219-236.

图/表 7

图1

图2

表1

图3

图4

图5

图6

参考文献 28

[1]	HEIDARI S, SIMMHAN Y, CALHEIROS R N, et al. Scalable Graph Processing Frameworks:A Taxonomy and Open Challenges[J]. ACM Computing Surveys, 2018, 51(3):1-53.
[2]	NATIONAL RESEARCH COUNCIL, DIVISION ON ENGINEERING AND PHYSICAL SCIENCES, BOARD ON MATHEMATICAL SCIENCES AND THEIR APPLICATIONS, et al. Frontiers in Massive Data Analysis[R]. Washington: National Academies Press, 2013.
[3]	马帅, 刘建伟, 左信, 等. 图神经网络综述[J]. 计算机研究与发展, 2022, 59(1):47-80.
	MA Shuai, LIU Jianwei, ZUO Xin, et al. Survey on Graph Neural Network[J]. Chinese Journal of Computer Research and Development, 2022, 59(1):47-80.
[4]	刘宇涵, 陈红, 刘艺璇, 等. 图数据上的隐私攻击与防御技术[J]. 计算机学报, 2022, 45(4):702-734.
	LIU Yuhan, CHEN Hong, LIU Yixuan, et al. State-of-the-Art Privacy Attacks and Defenses on Graphs[J]. Chinese Journal of Computers, 2022, 45(4):702-734.
[5]	DWORK C. Differential Privacy[C]// International Colloquium on Automata,Languages,and Programming.Berlin:Springer, 2006:1-12.
[6]	徐花, 田有亮. 差分隐私下的权重社交网络隐私保护[J]. 西安电子科技大学学报, 2022, 49(1):17-25.
	XU Hua, TIAN Youliang. Protection of Privacy of the Weighted Social Network under Differential Privacy[J]. Journal of Xidian University, 2022, 49(1):17-25.
[7]	HAY M, LI C, MIKLAU G, et al. Accurate Estimation of the Degree Distribution of Private Networks[C]// 2009 Ninth IEEE International Conference on Data Mining.Piscataway:IEEE, 2009:169-178.
[8]	KASIVISWANATHAN S P, NISSIM K, RASKHODNIKOVA S, et al. Analyzing Graphs with Node Differential Privacy[C]// The 10th Theory of Cryptography Conference.Berlin:Springer, 2013:457-476.
[9]	DAY W Y, LI N, LYU M. Publishing Graph Degree Distribution with Node Differential Privacy[C]// The 2016 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2016:123-138.
[10]	张宇轩, 魏江宏, 李霁, 等. 点差分隐私下图数据的度直方图发布方法[J]. 计算机研究与发展, 2019, 56(3):508-520.
	ZHANG Yuxuan, WEI Jianghong, LI Ji, et al. Graph Degree Histogram Publication Method with Node-Differential Privacy[J]. Chinese Journal of Computer Research and Development, 2019, 56(3):508-520.
[11]	AMIN F, MAJEED A, MATEEN A, et al. A Systematic Survey on the Recent Advancements in the Social Internet of Things[J]. IEEE Access, 2022, 10:63867-63884. doi: 10.1109/ACCESS.2022.3183261
[12]	WANG T, BLOCKI J, LI N, et al. Locally Differentially Private Protocols for Frequency Estimation[C]// 26th USENIX Security Symposium(USENIX Security 17).Berkeley:USENIX, 2017:729-745.
[13]	叶青青, 孟小峰, 朱敏杰, 等. 本地化差分隐私研究综述[J]. 软件学报, 2018, 29(7):1981-2005.
	YE Qingqing, MENG Xiaofeng, ZHU Minjie, et al. Survey on Local Differential Privacy[J]. Chinese Journal of Software, 2018, 29(7):1981-2005.
[14]	QIN Z, YU T, YANG Y, et al. Generating Synthetic Decentralized Social Graphs with Local Differential Privacy[C]// 2017 ACM SIGSAC Conference on Computer and Communications Security. New York: ACM, 2017:425-438.
[15]	WEI C, JI S, LIU C, et al. AsgLDP:Collecting and Generating Decentralized Attributed Graphs with Local Differential Privacy[J]. IEEE Transactions on Information Forensics and Security, 2020, 15:3239-3254. doi: 10.1109/TIFS.10206
[16]	LI Z, WANG T, LOPUHAÄ-ZWAKENBERG M, et al. Estimating Numerical Distributions under Local Differential Privacy[C]// The 2020 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2020:621-635.
[17]	YE Q, HU H, AU M H, et al. LF-GDPR:A Framework for Estimating Graph Metrics with Local Differential Privacy[J]. IEEE Transactions on Knowledge and Data Engineering, 2020, 34(10):4905-4920. doi: 10.1109/TKDE.2020.3047124
[18]	IMOLA J, MURAKAMI T, CHAUDHURI K. Locally Differentially Private Analysis of Graph Statistics[C]// The 30th USENIX Security Symposium(USENIX Security 21).Berkeley:USENIX, 2021:983-1000.
[19]	BITTAU A, ERLINGSSON Ú, MANIATIS P, et al. Prochlo:Strong Privacy for Analytics in the Crowd[C]// The 26th Symposium on Operating Systems Principles. New York: ACM, 2017:441-459.
[20]	BALLE B, BELL J, GASCÓN A, et al. The Privacy Blanket of the Shuffle Model[C]// Cryptology-CRYPTO 2019-39th Annual International Cryptology Conference.Berlin:Springer, 2019:638-667.
[21]	WANG T, DING B, XU M, et al. Improving Utility and Security of the Shuffler-Based Differential Privacy[J]. VLDB Endowment, 2020, 13(13):3545-3558.
[22]	LI X, LIU W, ZI Y, et al. DUMP:A Dummy-Point-Based Framework for Histogram Estimation in Shuffle Model[J]. CoRR abs, 2020:2009.13738.
[23]	CHEU A. Differential Privacy in the Shuffle Model:A Survey of Separations[J]. CoRR abs, 2021:2107.11839.
[24]	张啸剑, 徐雅鑫, 夏庆荣, 等. 基于混洗差分隐私的直方图发布方法[J]. 软件学报, 2022, 33(6):2348-2363.
	ZHANG Xiaojian, XU Yaxin, XIA Qingrong, et al. Histogram Publication under Shuffled Differential Privacy[J]. Chinese Journal of Software, 2022, 33(6):2348-2363.
[25]	刘艺菲, 王宁, 王志刚, 等. 混洗差分隐私下的多维类别数据的收集与分析[J]. 软件学报, 2022, 33(3):1093-1110.
	LIU Yifei, WANG Ning, WANG Zhigang, et al. Collecting and Analyzing Multidimensional Categorical Data Under Shuffled Differential Privacy[J]. Chinese Journal of Software, 2022, 33(3):1093-1110.
[26]	ZHANG S, NI W, FU N. Differentially Private Graph Publishing with Degree Distribution Preservation[J]. Computers & Security, 2021, 106(6):102285. doi: 10.1016/j.cose.2021.102285
[27]	IMOLA J, MURAKAMI T, CHAUDHURI K. Differentially Private Triangle and 4-Cycle Counting in the Shuffle Model[C]// The 2022 ACM SIGSAC Conference on Computer and Communications Security. New York: ACM, 2022:1505-1519.
[28]	MCSHERRY F D. Privacy Integrated Queries:An Extensible Platform for Privacy-Preserving Data Analysis[C]// 2009 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2009:19-30.

数据集	\|V\|	\|E\|	d_max
Twitter	81 306	1 768 149	3 383
Amazon	262 111	1 234 877	420
com-dblp.ungraph	317 080	1 049 866	343
gemsec-facebook_artist	50 515	819 306	1 469
gemsec_deezer_HU	47 538	222 887	112