Font Size: a A A

Research On Ciphertext Search Method Based On Document Feature Keywords Matching

Posted on:2022-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:H XuFull Text:PDF
GTID:2518306548961229Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
In the era of big data Internet,more and more users or enterprises choose to store data information on cloud servers in order to solve the pressure of local storage and realize convenient and fast data sharing,so as to obtain efficient cloud storage services and management.However,if the data is stored directly in the cloud,some private information may be leaked.Therefore,data needs to be encrypted before uploading to the cloud,and the previous plaintext search scheme is no longer applicable to ciphertext,so searchable encryption technology came into being.However,with the increasing amount of data stored by users or enterprises,the existing searchable encryption technology will face the problems of tedious encryption process,high time cost and low retrieval efficiency.Based on this,the main work of this paper is as follows:A ranking search method of document feature matching joint keywords is proposed.firstly,several representative feature keywords in the document are extracted,and the feature keywords of all documents are summarized and reprocessed to form a keyword set.Then the random algorithm is used for every d keywords of the keyword set to form joint keywords,and all the joint keywords form a keyword dictionary.At this point,the dimension of the dictionary has changed from the original n dimension to the t=[n/d]dimension.Before the document is uploaded to the cloud server,an index vector is created for each document,and each dimension of the vector corresponds to the weighted score calculated after matching the feature keywords of the document with each joint keyword in the dictionary.When queries,the query keywords matches each of the joint keywords in the dictionary,calculating the weighted score,creating query vectors,the dimension of the dictionary also determines the dimension of the index and querying vector.Finally,the cloud server uses the BM25 algorithm model to calculate the inner product of the document index and query vector,sort the inner product value in descending order,and return the first k(Top-k)results.This method improves the accuracy of retrieval,changes the encryption process of high dimension into low dimension,simplifies the encryption process,and improves the efficiency of retrieval.This paper also proposes a fast dimensionality reduction sorting search method based on document feature matching,in which the extracted document feature keywords are directly composed of feature dictionary.In order to solve the problem that the score of most keywords is lower than that of a few keywords in the process of keyword matching calculation.In this method,feature score algorithm and query matching algorithm are proposed.Before the document is uploaded to the cloud server,the score of the feature keywords in each document and each feature in the feature dictionary are calculated by the feature score algorithm,and the index feature vector of the document is created.In the query,the score of the query keyword and each feature in the feature dictionary is calculated by the matching score algorithm,and the query matching vector is created.Then the method uses K-L transform algorithm to reduce the dimension of m document index feature vector and query matching vector,so that the main information is retained in a few dimensions to the maximum extent.Finally,the cloud server calculates the inner product by using the scores of the index feature vector and the query matching vector under different matching conditions,sorts the inner product values in descending order,returns the results,improves the accuracy of the sorting,reduces the computational complexity of encryption,and improves the efficiency of retrieval.At the same time,the two proposed schemes are analyzed in terms of privacy protection,accuracy,complexity and efficiency,as well as experimental verification.From the analysis process and experimental results,we can see that the two schemes reduce the computational complexity of the encryption process and the inner product to some extent,save the storage space of the index,and improve the sorting accuracy and efficiency of retrieval.
Keywords/Search Tags:searchable encryption, feature matching, joint keywords, dimensionality reduction, index feature vector, query matching vector
PDF Full Text Request
Related items