西安电子科技大学学报 ›› 2024, Vol. 51 ›› Issue (1): 187-200.doi: 10.19665/j.issn1001-2400.20230205

• 网络空间安全 • 上一篇    下一篇

面向云存储的数据流行度去重方案

何欣枫1,2(), 杨琴琴1,2()   

  1. 1.河北大学 网络空间安全与计算机学院,河北 保定 071002
    2.河北省高可信信息系统重点实验室,河北 保定 071002
  • 收稿日期:2022-10-25 出版日期:2024-01-20 发布日期:2023-08-30
  • 通讯作者: 杨琴琴(1995—),女,河北大学硕士研究生,E-mail:yangqinqin202207@163.com
  • 作者简介:何欣枫(1976—),男,副教授,E-mail:popsoda@126.com
  • 基金资助:
    河北省自然科学基金(F2021201049)

Deduplication scheme with data popularity for cloud storage

HE Xinfeng1,2(), YANG Qinqin1,2()   

  1. 1. School of Cyberspace Security and Computer,Hebei University,Baoding 071002,China
    2. Key Lab of High Trusted Information System of Hebei Province,Baoding 071002,China
  • Received:2022-10-25 Online:2024-01-20 Published:2023-08-30

摘要:

随着云计算的发展,企业和个人倾向于把数据外包给云存储服务器来缓解本地存储压力,导致云端存储压力成为一个日益突出的问题。为了提高云存储效率,降低通信成本,数据去重技术得到了广泛应用。现有的数据去重技术主要包括基于哈希表的相同数据去重和基于布隆过滤器的相似数据去重,但都很少考虑数据流行度的影响。实际应用中,用户外包给云服务器的数据分布是不均匀的,根据访问频率可以划分为流行数据和非流行数据。流行数据访问频繁,在云服务器中会存在大量的副本和相似数据,需要执行高精度的数据去重;而非流行数据访问频率低,云存储服务器中的副本数量和相似数据较少,低精度的去重即可满足要求。针对上述问题,将数据流行度和布隆过滤器相结合,提出一种基于数据流行度的动态布隆过滤器;同时,提出一种基于数据流行度的动态布隆过滤器的数据去重方案,可以根据数据流行度动态调整去重精度。仿真结果表明,该方案在时间消耗、空间消耗和误判率之间取得了良好的平衡。

关键词: 云计算, 云存储, 数据去重, 数据流行度, 布隆过滤器

Abstract:

With the development of cloud computing,more enterprises and individuals tend to outsource their data to cloud storage providers to relieve the local storage pressure,and the cloud storage pressure is becoming an increasingly prominent issue.To improve the storage efficiency and reduce the communication cost,data deduplication technology has been widely used.There are identical data deduplication based on the hash table and similar data deduplication based on the bloom filter,but both of them rarely consider the impact of data popularity.In fact,the data outsourced to the cloud storage can be divided into popular and unpopular data according to their popularity.Popular data refer to the data which are frequently accessed,and there are numerous duplicate copies and similar data in the cloud,so high-accuracy deduplication is required.Unpopular data,which are rarely accessed,have fewer duplicate copies and similar data in the cloud,and low-accuracy deduplication can meet the demand.In order to address this problem,a novel bloom filter variant named PDBF(popularity dynamic bloom filter) is proposed,which incorporates data popularity into the bloom filter.Moreover,a PDBF-based deduplication scheme is constructed to perform different degrees of deduplication depending on how popular a datum is.Experiments demonstrate that the scheme makes an excellent tradeoff among the computational time,the memory consumption,and the deduplication efficiency.

Key words: cloud computing, cloud storage, data deduplication, data popularity, bloom filter

中图分类号: 

  • TP309.2
Baidu
map