The Research And Application Of Distributed System Based On Hadoop

Posted on:2015-01-20

Degree:Master

Type:Thesis

Country:China

Candidate:Z Wang

Full Text:PDF

GTID:2268330428985650

Subject:Software engineering

Abstract/Summary:

The concept of big data become popular in recent years. With the rapid development of computer technology, people got more and more access to information while data also got explosive growth. According to the U.S. Internet Data Center report, Internet data grows about50percent annually. That is to say90%of the worldâ€™s electronic data are produced only in recent years. It became a popular and practical research of dealing with the massive data. These massive data is so called big data. Its common characteristics is Volume, Velocity, Variety. The big data with complex structure is different from the common information stored in databases cause it doesnâ€™t have a standard structure. At first glance you may find no law or even disorganized with Big Data. The currentâ€used data processing software cost too much time dealing with the Big Data in regular method, which is not conducive for business decision making.While Big Data is tremendous and messy,after professional treatment we are able to get a deeper level of information, with more decisionâ€making and insight power. So how to process as fast as possible to meet the needs of enterprise is the main focus of the study as well as the forefront of industry technology. Among the processing methods, distributed computing is a promising approach. Distributed computing, by the definition, is composed of a single computer after the distributed neural network computing. the final result will be merged by the neural node. The concept of distributed computing came earlier than the Big Data. Its most important goal is to share scarce resources and load balancing.Different from standâ€alone computing, distributed computing costs much less on a single device. Previous scientific computing must rely on single powerful machine, by the great performance of a single computer to complete a huge amount of computing. Distributed computing systems not only reduces the cost of the system, but also match with two features: load balancing and sharing of resources. In this paper, we use the popular open source Hadoop distributed systems to achieve the target. Hadoop not only meet the characteristics of distributed computing, but also perfectly suitable for enterprises for the reasonable cost and easyâ€deploy Javaâ€based implementation. We deployed the distributed computing in the practical environment of financial industry and demonstrates the perspective that distributed computing can provide the computing power to handle Big Data Processing while greatly reduced the corporate operating costs from a practical point of view. It is a general trend of the future development of enterprises.

Keywords/Search Tags:

Big Data, Distributed computing, Hadoop cluster, load balancing

Related items

1	Research On Energy-aware Load Balancing In Heterogeneous Hadoop Cluster
2	Load Balancing Problems For Parallel And Distributed Computing
3	Based On Feedback Scheduling Algorithms For Dynamic Load Balancing In The Heterogeneous Environment Of Hadoop Design And Implementation
4	Research On Optimization Of Data Load Balancing In Hadoop Clusters And Application Of Haddoop Platform
5	Research On Load Balancing Algorithm For Scheduling Based On Hadoop
6	Research And Application Of Load Balancing Technology In Semantic Switch
7	Cluster Computing Load Balancing Strategy In The Visible Platform
8	Distributed Database Cluster System Zd-ddb Design And Implementation
9	Research On Key Technologies Of Load Balancing Virtual Cloud Platform And Database Cluster
10	Load Balancing Of Distributed Data Stream Processing System Research And Implementation Of Technology