Research On Data Security Deduplication In Cloud Storage

Posted on:2022-08-18

Degree:Master

Type:Thesis

Country:China

Candidate:X L Mu

Full Text:PDF

GTID:2518306566991059

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of the Internet,data has increased greatly,in order to manage and save data,users must to consume lots of human cost and material resources,so cloud storage technology arises at the historical moment.Users hold the same data among themselves,so the cloud has to store massive duplicate data,resulting in data redundancy and waste of cloud storage resources,and at the same time greatly reducing the efficiency of data transmission.The emergence of data de-duplication technology not only saves storage resources and bandwidth,but also causes problems such as low de-duplication efficiency and data leakage:(1)Many scheduling conflict of computer will appear in de-duplication.How to solve the scheduling conflict while protecting user data privacy and improve the efficiency of de-duplication operation.The existing de-duplication schemes have not considered this practical problem in terms of de-duplication efficiency,so that the de-duplication efficiency needs to be improved.(2)In the current existing schemes,it is not considered whether ordinary de-duplication methods will cause the leakage of internal data of users of special groups.To solve the above problems,we have proposed the following two solutions:A de-duplication operation scheduling scheme based on LSTM networks is proposed,which solves the scheduling conflict problem generated in the de-duplication process and improves the efficiency of de-duplication operations while protecting user data privacy.For the first time,I explored how to improve the efficiency of de-duplication by solving the problem of computer scheduling conflicts,and trained a predictor based on LSTM,a long-short-term memory network,which can predict this machine in the future based on the historical operation of the cloud server The server scheduling situation,generating scheduling prediction results,and giving executable operation sequence recommendations based on the prediction results,reasonably scheduling server processes and performing de-duplication operations based on the operational sequences.This paper proposes a group user data de-duplication scheme based on density clustering,which solves the problem that group users,i.e.users with similar attribute similarity,have an impact on the popularity threshold in the process of uploading data,and avoids the data leakage caused by cloud server in group data de-duplication.At the same time,the scheme has the function of data recovery,which is to assist the user to recover data when data loss occurs.Group users and individual users are classified by numerical user attributes,and the group identification of subsequent newly uploaded users is carried out with the classification results.Combined with the popularity threshold,dynamic counting and updating are carried out to ensure that new uploaded users will not change the current popularity threshold and ensure data security.

Keywords/Search Tags:

Data-Deduplication, Scheduling Optimization, Prediction Model, Attribute Similarity, Popularity Threshold

PDF Full Text Request

Related items

1	Popularity Prediction Based On Microblog Mining
2	Research On Deduplication Protocol Over Encrypted Data
3	Research On Security Deduplication Technology Of Cloud Storage Encrypted Data
4	Research On File Attribute-aware Data Security Deduplication Strategy
5	Media Popularity Prediction Algorithm Based On Multiple Attributions
6	Data Attribute-based Prediction Models Improvement Research In WSN
7	Collaborative Filtering Based On Item Popularity Weighting Research And Development Of Recommendation System
8	Research On Performance Optimization Of Virtual Machine Image Deduplication For Cloud Data Center
9	A Research On Popularity Prediction Of Tourist Attractions Based On Multi-source Heterogeneous User-Generated Data
10	Research Of Data Deduplication In Data Disaster Tolerance Systems