Font Size: a A A

Querying Relational Database Based On Hadoop Platform And Its Implementation

Posted on:2012-01-13Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y HuangFull Text:PDF
GTID:2248330371965731Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Cloud computing is a newly proposed calculation model. It’s the development of distributed computing, parallel computing and grid computing. It simplifies client, make client only responsible for data input or read and make the "cloud" to handle the large and complex task. Cloud storage is a system which uses application software to associate many different types of storage devices in network together to provide the capacity of data storage and business access. The technology cloud storage used is cluster, grid and distributed file system technology. Among many open source could platforms, Hadoop has attached wide attention. Hadoop is designed and implemented by Free Software Foundation Apache based on Google’s cloud computing concept. The core of Hadoop is MapReduce programming model and HDFS which is short for Hadoop Distributed File System.The traditional scientific data management system is usually based on relational database. But relational databases can only manage a relatively small amount of data. For large quantities of data and massive data, it appears to be inadequate. How to manage massive data effectively is a problem worthy of study.For the question above, this article has done some work as follows:1. Design and implement a traditional scientific data management system so that we can manage data much easier.2. Move the scientific data management system to Hadoop cloud platform. Implement a data management system based on Hadoop platform.3. Chang the number of metadata records and the number of nodes in Hadoop cluster, record the query time under these conditions, then compare to the query time which is spent by Mysql database to query the same data. At last this article analysises the advantages and disadvantages of using Hadoop platform to query relational data. This article also analysises the factors influencing query time with Hadoop cluster which provides a reference for further study and modify query relational data using cloud platform at a later time.
Keywords/Search Tags:Hadoop, Cloud computing, HDFS, MapReduce, Hive, Query relational database by cloud platform
PDF Full Text Request
Related items