Research And Optimization On Keyword Retrieval Algorithm For Big Data

Posted on:2017-02-27

Degree:Master

Type:Thesis

Country:China

Candidate:S Wang

Full Text:PDF

GTID:2348330503491936

Subject:Mathematics

Abstract/Summary:

PDF Full Text Request

To solve the problems of the traditional retrieval model in data retrieval, such as data redundancy, fuzziness of matching, and lack of effective results, the thesis combines with current research hotspots including expound model and method of heterogeneous data integration, redundant data elimination, efficient data classification, and keyword retrieval(KR), which under the big data environment aimed to make full use of the traditional technology and virtualization technology, Map-Reduce and improved vector space retrieval model as an integration to optimize storage model, classification algorithm, and retrieval algorithm, thus improve operation efficiency of algorithm, and provide users with data retrieval foundation platform, which integrates a set of data storage, data classification,and data retrieval.Taking KR algorithm as the research object, the main research contents are as follows:First of all, in order to effectively provide good data storage based on data retrieval,the design and optimization of data model is researched, as well as the design of the data storage model under cloud computing environment, update of algorithm based on file blocks, and fault recovery mechanism based on cloud storage, etc..Secondly, in order to satisfy the accurate retrieval requirements for different data,parallel classification hybrid algorithm(PCHA) is proposed on the basis of the original classification algorithm, in which adjacent classification algorithm for processing classification of large data with massive attributes is merged with Map-Reduce. Thus the ability of modeling prediction and classification recognition rate of original classification algorithm will be optimized and upgraded.Thirdly, disorderly keyword retrieval algorithm(DKRA) is proposed based on the research on traditional retrieval algorithm, which makes good uses of vector retrieval model calculation for convenience and low complexity and introduced the K-D matrix structure and the similarity calculation methods in its design. Through the comparison with calculation method of obtaining similarity by calculating keyword sequence weight, the advantages of DKRA are expressed in computational efficiency.Lastly, orderly keyword retrieval algorithm(OKRA) is proposed on the basis of the DKRA, OKRA uses order of retrieval keyword, and gives definitions of keyword retrieval step length, overall retrieval step length, relevant data retrieval step length, overall retrieval data step length and position matching degree(PMD) calculating formula. In the calculation of similarity, the PDM is introduced to reduce the error rate caused by the retrieval order of the KR. This algorithm can be used to filter out useless data, reduce timeconsuming of data set traversal and improve the quality of return of related retrieve data.

Keywords/Search Tags:

big data, storage model, parallel classification, keyword retrieval, algorithm optimization

PDF Full Text Request

Related items

1	Research On Keyword Privacy Of Ciphertext Retrieval In The Cloud Storage
2	Research Of Optimization Methods On Automated Storage And Retrieval Systems
3	Research On Key Issues Of Secure Data Storage In Cloud Environment
4	Research On Model And Algorithm For Storage Assignment Optimization In Warehouse
5	Research On XML Data Management For Retrieval And Classification
6	Research On Multi-keyword Retrieval Over Encrypted Data
7	Research On Storage And Retrieval Optimization Of Big Data
8	Research On Multi-keyword Search Over Encryption Data In Cloud Storage
9	Research And Implementation Of The Legal Retrieval System For Natural Language
10	Research Of Privacy-Preserving Multi-Keyword Search Schema Over Encrypted Cloud Data