Font Size: a A A

Research On Fast Queryalgorithm Of Massive Data

Posted on:2013-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:X CengFull Text:PDF
GTID:2218330371457606Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rise of new applications, such as blog, wiki, shared space, twitter etc, the Internet has entered an era of information explosion, in which more and more data need to be addressed and the data processing requirements are increasingly high. Faced with growing data, fast query algorithm of massive data has become a hot research topic.The thesis does research on the existing massive data query technologies in order to improve the efficiency of massive data query, and focuses on the Top-k query algorithms. Top-k query calculates the grade of attribute based on user requests, and combines the attribute grades by using an aggregation function as an overall grade, then return k objects with the highest overall grades. Top-k query has good query efficiency in the environment of massive data.The thesis first describes the existing massive data query technologies, such as index, SQL statement optimization, data prefetching, approximate matching, distributed query etc, and summarizes the application scopes of these technologies. Then a new algorithm called TABE (Top-k Algorithm Based on Extraction) is proposed, which is based on the research of TA (Threshold Algorithm), NRA (No Random Access) algorithm as well as the idea of approximate matching. Firstly, the optimal tuples are extracted. Then the query algorithm runs on these tuples. In order to test the performance of TABE algorithm, the thesis designs a test experiment, in which TABE algorithm is compared with the classic NRA algorithm. The experimental results show that TABE algorithm not only has lower time complexity, but also has higher degree of accuracy which is able to meet conventional query. Responding the trend of parallel processing of massive data, TABE algorithm is also implemented in Hadoop environment and its performance is also tested in this environment. The test results show that TABE can get higher query efficiency with cloud computing.The thesis has done beneficial research work on fast query of massive data.
Keywords/Search Tags:Massive Data, Top-k, Hadoop, Hive
PDF Full Text Request
Related items