Font Size: a A A

Privacy Preserving Query Processing Over Unstructured Big Data

Posted on:2020-11-09Degree:MasterType:Thesis
Country:ChinaCandidate:W L YangFull Text:PDF
GTID:2428330590458393Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Nowadays,with the wide spread of information in society,big data has become an indispensable product of the times.The advantages of big data are becoming obvious,which urge people to explore efficient technologies to process big data,including data storage,query and analysis.Among them,big data query plays an important role.However,when we benefit from big data,it becomes difficult to protect personal privacy.Illegal disclosure of personal sensitive information happens frequently,which leads to the danger of privacy.Therefore,it is urgent to pay more attentions on privacy preserving.This paper finds that most of researches on similarity joins are oriented to an optimization of time.As an essential operator in data mining and analysis,similarity join is resource intensive and time consuming,particularly when dealing with big data.But there is also a need to ensure data confidentiality in similarity joins,as joining between two files may result in personal information disclosure.Based on the above considerations,this paper proposes a MapReduce-based similarity joins with differential privacy technology(hereafter,referred to as PSJoin).The proposed parallel algorithm is designed to achieve high efficiency,in terms of answering similarity join queries privately and effectively.Specifically,the use of PSJoin ensures the preservation of privacy during the similarity join process and in the published results.A new private global ordering approach is presented to deal with disclosure problem in the process,and a differential private similarity function is provided for this algorithm,which is proved to satisfy differential privacy.Finally,these methods are embedded in MapReduce framework to further deal with the bottleneck of big data query.Comprehensive experimental results and analysis on large-scale real-world datasets demonstrate that our method effectively prevents privacy leakage in similarity joins.Compared with traditional similarity joins,the privacy-preserving similarity joins can further improve query efficiency by adjusting to fixed parameters,which guarantees privacy with only minimal accuracy loss in similarity queries,while offering good scalability consistently.
Keywords/Search Tags:Big Data Query, Privacy Preserving, Similarity Joins, MapReduce, Differential Privacy
PDF Full Text Request
Related items