Research On Fast Queryalgorithm Of Massive Data

Posted on:2013-02-13

Degree:Master

Type:Thesis

Country:China

Candidate:X Ceng

Full Text:PDF

GTID:2218330371457606

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the rise of new applications, such as blog, wiki, shared space, twitter etc, the Internet has entered an era of information explosion, in which more and more data need to be addressed and the data processing requirements are increasingly high. Faced with growing data, fast query algorithm of massive data has become a hot research topic.The thesis does research on the existing massive data query technologies in order to improve the efficiency of massive data query, and focuses on the Top-k query algorithms. Top-k query calculates the grade of attribute based on user requests, and combines the attribute grades by using an aggregation function as an overall grade, then return k objects with the highest overall grades. Top-k query has good query efficiency in the environment of massive data.The thesis first describes the existing massive data query technologies, such as index, SQL statement optimization, data prefetching, approximate matching, distributed query etc, and summarizes the application scopes of these technologies. Then a new algorithm called TABE (Top-k Algorithm Based on Extraction) is proposed, which is based on the research of TA (Threshold Algorithm), NRA (No Random Access) algorithm as well as the idea of approximate matching. Firstly, the optimal tuples are extracted. Then the query algorithm runs on these tuples. In order to test the performance of TABE algorithm, the thesis designs a test experiment, in which TABE algorithm is compared with the classic NRA algorithm. The experimental results show that TABE algorithm not only has lower time complexity, but also has higher degree of accuracy which is able to meet conventional query. Responding the trend of parallel processing of massive data, TABE algorithm is also implemented in Hadoop environment and its performance is also tested in this environment. The test results show that TABE can get higher query efficiency with cloud computing.The thesis has done beneficial research work on fast query of massive data.

Keywords/Search Tags:

Massive Data, Top-k, Hadoop, Hive

PDF Full Text Request

Related items

1	Design And Implementation Of Massive Web Log Analysis System Based On Hadoop/Hive
2	Research On Fast Queryalgorithm Of Massive Data
3	Performance Optimization Of A Massive Data Query And Analysis System On Hadoop
4	The Help Of Book Lessons For Early Education Eased On Hadoop
5	Design And Implementation Of Contextual Marketing Based On Distributed Computing Hive And Data Mining
6	Design And Implementation Of Hive-based Purchase And Sale Data Warehouse System
7	Implementation And Application Of E-commerce Data Analysis Platform Based On Hive
8	A Design And Implementation On Storage Structure Extension Of Big Data Warehouse Hive
9	Compatible Study Of Hadoop For Efficient Analyzing And Processing Of Big Data
10	The Design And Implementation Of Massive Data Storage And Calculation Platform Based On Hadoop