J4 ›› 2014, Vol. 41 ›› Issue (5): 148-154+160.doi: 10.3969/j.issn.1001-2400.2014.05.025

• 研究论文 • 上一篇    下一篇

面向大数据的个性化检索中用户匿名化方法

康海燕1;XIONG Li2   

  1. (1. 北京信息科技大学 信息管理学院信息安全系,北京  100192;
    2. Department of Mathcs, Emory University, Atlanta, USA  30322)
  • 收稿日期:2013-05-08 出版日期:2014-10-20 发布日期:2014-11-27
  • 通讯作者: 康海燕
  • 作者简介:康海燕(1971-),男,教授,博士, E-mail:kanghaiyan@126.com.
  • 基金资助:

    教育部人文社会科学资助项目(11YJC870011);国家自然科学基金资助项目(61370139);北京市教委科技计划面上资助项目(KM201211232014);国家科技支撑计划资助项目(2012BAH08B02,2012JGZD07)

Enhancing user privacy for personalized web search in big data

KANG Haiyan1;XIONG Li2   

  1. (1. School of Information Management, Beijing Information Science and Technology University, Beijing  100192, China;
    2. Department of Mathcs, Emory University, Atlanta, USA  30322)
  • Received:2013-05-08 Online:2014-10-20 Published:2014-11-27
  • Contact: KANG Haiyan

摘要:

为解决大数据中个性化检索技术所潜在的用户隐私安全和提升个性化信息检索性能之间的矛盾,提出了基于差分隐私与p-link技术相结合的用户兴趣模型匿名化方法.首先对用户的准标示符进行泛化并添加噪音满足差分隐私保护要求,最大化统计数据库中的查询精度,同时最小化识别个体及属性的概率;其次根据用户兴趣之间的相似性将其微聚为满足p-link的等价组,并计算微聚后等价组兴趣条目的权值和等价组质心;最后发布匿名化的数据.大量实验证明:该方法结合差分隐私与p-link两者的特性,实现用户兴趣模型匿名化且用户兴趣基本不发生改变,既能保护用户的隐私信息,又能保证个性化检索性能.

关键词: 用户兴趣模型, 匿名化, 隐私保护, 信息安全, 差分隐私

Abstract:

To solve the contradiction between leaking user privacy potentially existing in large data and enhancing the performance of personalized information retrieval, an anonyminzation method based on the differential privacy with p-link technology is proposed. First, we generalize quasi identifiers and add noise to meet the differential privacy requirements. This method can maximize the query accuracy of statistical database, while minimizing the probability of identification records. Secondly, they cluster to meet the p-link equivalence group by the similarity between user profiles, and we calculate weights and equivalence group centroid. Finally, we release anonymized data. Experimental results demonstrate that the method of integrating the characteristics of differential privacy and p-link does not change users' interests, and that it can protect users' privacy, but also ensures a personalized retrieval performance.

Key words: user profile, anonymization, privacy protection, information security, differential privacy

Baidu
map