Font Size: a A A

Research And Optimization On Keyword Retrieval Algorithm For Big Data

Posted on:2017-02-27Degree:MasterType:Thesis
Country:ChinaCandidate:S WangFull Text:PDF
GTID:2348330503491936Subject:Mathematics
Abstract/Summary:PDF Full Text Request
To solve the problems of the traditional retrieval model in data retrieval, such as data redundancy, fuzziness of matching, and lack of effective results, the thesis combines with current research hotspots including expound model and method of heterogeneous data integration, redundant data elimination, efficient data classification, and keyword retrieval(KR), which under the big data environment aimed to make full use of the traditional technology and virtualization technology, Map-Reduce and improved vector space retrieval model as an integration to optimize storage model, classification algorithm, and retrieval algorithm, thus improve operation efficiency of algorithm, and provide users with data retrieval foundation platform, which integrates a set of data storage, data classification,and data retrieval.Taking KR algorithm as the research object, the main research contents are as follows:First of all, in order to effectively provide good data storage based on data retrieval,the design and optimization of data model is researched, as well as the design of the data storage model under cloud computing environment, update of algorithm based on file blocks, and fault recovery mechanism based on cloud storage, etc..Secondly, in order to satisfy the accurate retrieval requirements for different data,parallel classification hybrid algorithm(PCHA) is proposed on the basis of the original classification algorithm, in which adjacent classification algorithm for processing classification of large data with massive attributes is merged with Map-Reduce. Thus the ability of modeling prediction and classification recognition rate of original classification algorithm will be optimized and upgraded.Thirdly, disorderly keyword retrieval algorithm(DKRA) is proposed based on the research on traditional retrieval algorithm, which makes good uses of vector retrieval model calculation for convenience and low complexity and introduced the K-D matrix structure and the similarity calculation methods in its design. Through the comparison with calculation method of obtaining similarity by calculating keyword sequence weight, the advantages of DKRA are expressed in computational efficiency.Lastly, orderly keyword retrieval algorithm(OKRA) is proposed on the basis of the DKRA, OKRA uses order of retrieval keyword, and gives definitions of keyword retrieval step length, overall retrieval step length, relevant data retrieval step length, overall retrieval data step length and position matching degree(PMD) calculating formula. In the calculation of similarity, the PDM is introduced to reduce the error rate caused by the retrieval order of the KR. This algorithm can be used to filter out useless data, reduce timeconsuming of data set traversal and improve the quality of return of related retrieve data.
Keywords/Search Tags:big data, storage model, parallel classification, keyword retrieval, algorithm optimization
PDF Full Text Request
Related items