Font Size: a A A

The Research And Application Of Distributed System Based On Hadoop

Posted on:2015-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:Z WangFull Text:PDF
GTID:2268330428985650Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The concept of big data become popular in recent years. With the rapid development of computer technology, people got more and more access to information while data also got explosive growth. According to the U.S. Internet Data Center report, Internet data grows about50percent annually. That is to say90%of the world’s electronic data are produced only in recent years. It became a popular and practical research of dealing with the massive data. These massive data is so called big data. Its common characteristics is Volume, Velocity, Variety. The big data with complex structure is different from the common information stored in databases cause it doesn’t have a standard structure. At first glance you may find no law or even disorganized with Big Data. The current‐used data processing software cost too much time dealing with the Big Data in regular method, which is not conducive for business decision making.While Big Data is tremendous and messy,after professional treatment we are able to get a deeper level of information, with more decision‐making and insight power. So how to process as fast as possible to meet the needs of enterprise is the main focus of the study as well as the forefront of industry technology. Among the processing methods, distributed computing is a promising approach. Distributed computing, by the definition, is composed of a single computer after the distributed neural network computing. the final result will be merged by the neural node. The concept of distributed computing came earlier than the Big Data. Its most important goal is to share scarce resources and load balancing.Different from stand‐alone computing, distributed computing costs much less on a single device. Previous scientific computing must rely on single powerful machine, by the great performance of a single computer to complete a huge amount of computing. Distributed computing systems not only reduces the cost of the system, but also match with two features: load balancing and sharing of resources. In this paper, we use the popular open source Hadoop distributed systems to achieve the target. Hadoop not only meet the characteristics of distributed computing, but also perfectly suitable for enterprises for the reasonable cost and easy‐deploy Java‐based implementation. We deployed the distributed computing in the practical environment of financial industry and demonstrates the perspective that distributed computing can provide the computing power to handle Big Data Processing while greatly reduced the corporate operating costs from a practical point of view. It is a general trend of the future development of enterprises.
Keywords/Search Tags:Big Data, Distributed computing, Hadoop cluster, load balancing
PDF Full Text Request
Related items