Design And Implementation Of Mapreduce-based Structured Query Mechanism

Posted on:2012-05-13

Degree:Master

Type:Thesis

Country:China

Candidate:B Fan

Full Text:PDF

GTID:2208330332986646

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of Web2.0 application and Cloud Computing, how to store and manage the massive data has become one of the essentials. Tranditional data management is now facing a challenge to adapt to the trait of new web application. Traditional RDBMS can not deal with increasingly massive data any more. The pivot of data management has turned to availability and Partition Tolerance from consistency and availability due to large scale computing and massive data storage. Thus, raise higher requirements in extention and availability. Existed DBMS has satisfied these two areas. However, compared with RDBMS, they only surpport key-based range and condition query. Multiple-column-based range and condition query is one of the general demands of DBMS. Existed DBMS will reduce its query efficiency with the increase of data scale due to MapReduce-based parallel full table scan. To deal with the low efficiency, a MapReduce-based distributed query mechanism is proposed with higher performance, stronger reliabiliy, lower storage overhead based on large distributed structured data management system. The feature and basis of distributed structured data management system is described as follows:Firstly, A bigtable-like distributed structure data management system is designed and implemented based on a P2P distributed storage system. The System mainly contains MapReduce distributed computing frame based on Linux C++ and distributed structured data management system based on the computing frame.Second,Reorganizing table's multiple copy with different orders can speed up the multi-demensional conditional query. It is necessary to specify the primary key of the table at the time of initially creating tables, and in LDS3 system, besides, Indexing other columns are supported in the system, then sort the table records with order of primary key and other columns priveously mentioned and storage those recoreds into disk with those orders. When processing the request of conditional and reage query, firstly, locating child table with use of the mapping table between the child tables and table servers, and then the result are locked in continuous records in the child table, so we can mimimize the number of random visiting. Combined with distributed computing framework based on MapReduce, DVCP, query the table records meeting the condition parallelly.Last, designing and implementing a Bitcask-based bottom storage model which has higher efficiency and lower complexity compared with MapFile Storage Engine.

Keywords/Search Tags:

MapReduce, multiple table copy with different orders, multi-dementsion conditional query, storeage engine based on Bitcask

PDF Full Text Request

Related items

1	Heuristics for multiple orders per job scheduling problems
2	Design And Realization Of Multi-Service Vertical Searching Engine Frameword In Enterprises
3	Sharing Query Results In MapReduce Framework
4	Research On An Efficient Top-k Query Algorithm Based On MapReduce
5	Research On Key Technology Of Optimization For Multi Join Based On Hadoop
6	Research Of Multi-engine Cloud Security Mechanism Based On Conditional Random Fields
7	Research On Query Optimization Of Distributed Database Middleware Mycat
8	The Improvement Of Genetic Algorithm In Multiple-tables Query Of Database
9	Studies On Query Processing And Optimization Techniques Based On MapReduce
10	Copy-move Detection Method Based On Conditional Generative Adversarial Networks