Font Size: a A A

Design And Implementation Of Mapreduce-based Structured Query Mechanism

Posted on:2012-05-13Degree:MasterType:Thesis
Country:ChinaCandidate:B FanFull Text:PDF
GTID:2208330332986646Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Web2.0 application and Cloud Computing, how to store and manage the massive data has become one of the essentials. Tranditional data management is now facing a challenge to adapt to the trait of new web application. Traditional RDBMS can not deal with increasingly massive data any more. The pivot of data management has turned to availability and Partition Tolerance from consistency and availability due to large scale computing and massive data storage. Thus, raise higher requirements in extention and availability. Existed DBMS has satisfied these two areas. However, compared with RDBMS, they only surpport key-based range and condition query. Multiple-column-based range and condition query is one of the general demands of DBMS. Existed DBMS will reduce its query efficiency with the increase of data scale due to MapReduce-based parallel full table scan. To deal with the low efficiency, a MapReduce-based distributed query mechanism is proposed with higher performance, stronger reliabiliy, lower storage overhead based on large distributed structured data management system. The feature and basis of distributed structured data management system is described as follows:Firstly, A bigtable-like distributed structure data management system is designed and implemented based on a P2P distributed storage system. The System mainly contains MapReduce distributed computing frame based on Linux C++ and distributed structured data management system based on the computing frame.Second,Reorganizing table's multiple copy with different orders can speed up the multi-demensional conditional query. It is necessary to specify the primary key of the table at the time of initially creating tables, and in LDS3 system, besides, Indexing other columns are supported in the system, then sort the table records with order of primary key and other columns priveously mentioned and storage those recoreds into disk with those orders. When processing the request of conditional and reage query, firstly, locating child table with use of the mapping table between the child tables and table servers, and then the result are locked in continuous records in the child table, so we can mimimize the number of random visiting. Combined with distributed computing framework based on MapReduce, DVCP, query the table records meeting the condition parallelly.Last, designing and implementing a Bitcask-based bottom storage model which has higher efficiency and lower complexity compared with MapFile Storage Engine.
Keywords/Search Tags:MapReduce, multiple table copy with different orders, multi-dementsion conditional query, storeage engine based on Bitcask
PDF Full Text Request
Related items