Font Size: a A A

Research And Application Of Big Data Migration And Query Based-on Hadoop Platform

Posted on:2015-10-03Degree:MasterType:Thesis
Country:ChinaCandidate:K LiuFull Text:PDF
GTID:2298330452950745Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Efficient data management system is quite important to data applicationmanagement, however, because the data size is continual growing, the data type ischanging, and the unstructured data is becoming an essential part of data storage andprocessing, the dominance of relational databases is shaken gradually. Usingrelational databases can’t solve the problems result from big data, and cannot meetthe requirement of effective data storage, data analysis and data access.Because of the transparency to the underlying storage and parallel processing, andalso the high-performance cluster of both computing and storage capacity, Hadoopstands out in the fields of distributed computing and big data processing. However,using the Hadoop platform to process big data and achieve high efficiency query needto migrate data from relational databases to Hadoop firstly, and then import data intoHadoop for analyzing and processing, finally optimize the core processing ofdatabase so that it can improve the database performance and finish the continuouslyupdated requirements of query.Based on the discussion of Hadoop platform architecture and data exchangeprinciples, this essay proposed a method of MapReduce technology based on theHadoop platform for data migration. MapReduce can achieve better concurrencywhich results in the reflection of optimizing data conversion.First the working mechanism of MapReduce and three common job schedulers inHadoop were analyzed. Furthermore, the MapReduce job scheduler and combinedpriority (higher response ratio) scheduling algorithm with the fair scheduler wereoptimized. This essay proposed a scheduling algorithm based on the priority of fairscheduler; on the other hand, listeners based on TaskTracker were also used to assistscheduling work.Then the working mechanisms of HBase and Hive were analyzed, andHive-HBase combination method for data query was explored, and data query planbased on Hive-HBase was designed.Finally, the experimental environment based on Hadoop platform for datamigration and data query was set up, and then the performance between different scheduling algorithms of migration was compared, include the data query efficiencybetween the original system and the system base on Hive-HBase architecture.The experiment proved that the scheme of complete data processing, includingdata migration and data query based on Hadoop platform which proposed in thisessay is feasible. The optimization of the scheduling algorithm during the dataprocess improved the data migration performance. Meanwhile, compared withtraditional relational databases, Hadoop platform for big data processing alsoreflected the advantage of query efficiency. Therefore, this essay has some referencevalue for big data processing.
Keywords/Search Tags:Hadoop, MapReduce, HBase, Hive, Data migrate, Data query
PDF Full Text Request
Related items