Font Size: a A A

Secure Ranked Search Based On Simhash Over Enervated Data

Posted on:2020-08-07Degree:MasterType:Thesis
Country:ChinaCandidate:Z LiFull Text:PDF
GTID:2428330623967008Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet and exponential growth of data,more and more users choose to store their data in cloud servers.Cloud storage reduces the burden of storage management for users and provides users with flexible cloud computing services.However,as a third-party platform,cloud service providers can easily access users' data,which results in privacy data leakage and other security issues.Therefore,in order to prevent unauthorized access and protect users' privacy and data security,users usually encrypt the data firstly,and then upload it to the cloud server.Because the encrypted data loses its plaintext feature,users can not retrieve data efficiently.At the same time,ensuring the confidentiality and availability of data has become the key point of the development of ciphertext retrieval technology in cloud storage.In this thesis,a multi-keyword ciphertext ranking search scheme supporting dynamic index updating is proposed,which is mainly studied from three aspects: the construction and updating of security index,multi-keyword ciphertext ranking search and security analysis.The main research contents are as follows:(1)Construction of SMRI ciphertext indexBy studying the index structure commonly used in ciphertext retrieval field,aiming at the problem of large space occupation and complex construction of high-dimensional vector index,a secure multi-keyword ranked search index(SMRI)based on Simhash's idea of dimensionality reduction was proposed in this thesis.Firstly,each document is processed into feature vectors based on TF-IDF rules.Then,based on the one-way key hash function HMAC-MD5,each document was processed into low-dimensional feature fingerprints using Simhash algorithm,and the fingerprints were segmented and pre-indexed into groups.Finally,the B+ tree was constructed by the binary group composed of segmented fingerprints and feature vectors,and encrypted by SkNN algorithm to obtain a secure ciphertext index.When the data changes,users only need to update the data in the binary,submit the update information to the cloud server,and use the characteristics of B+ tree itself to complete the dynamic update of the index.(2)Multi-keyword ranked search based on SMRIAccording to the query keywords submitted by users,the security trapdoor was generated.The vector space model based on TF-IDF and the Simhash algorithm with key were used to process the query keywords into vectors and fingerprints,and the vector was encrypted by SkNN algorithm to form the query trapdoor and submit it to the cloud.Aiming at the problem of high computational complexity and low ranking accuracy of correlation scores in retrieval process,a ranking scheme based on "filter-refine" strategy was designed in this thesis.Firstly,the cloud server matched all fingerprint sets whose Hamming distance is less than the threshold in SMRI according to the query fingerprint in trapdoor,filtered a large number of documents with low relevance to query keywords,and obtained candidate results.Then,according to the predefined TF-IDF rules,the inner product between feature vectors and query vector was calculated,and the candidate result were sorted accurately to get the top-k results.The whole process was completed on the cloud server to reduce the amount of communication with the client.(3)Security analysis and experimental designFirstly,this thesis summarized the common attack modes in ciphertext retrieval,established the threat models,defined the security objectives,and then analysed the security of the scheme in data documents,keywords,relevance scores and query privacy protection.Finally,the RFC was used as experimental data set,the efficiency of different schemes in index construction,trapdoor generation,index retrieval,index updating and retrieval accuracy were analyzed.The experimental results show that SMRI has high retrieval efficiency,low computational cost,saving time and space costs,and is suitable for fast retrieval of massive data.
Keywords/Search Tags:Ciphertext Retrieval, Multi-keyword Ranked Search, Simhash, Vector Space Model, Secure k-Nearest Neighbor, Privacy-Protection
PDF Full Text Request
Related items