Font Size: a A A

The Research Of Performance Optimization Of Hadoop In Big Data

Posted on:2016-01-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2308330461991711Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, the popularization of Internet, internet of things, and various mobile devices, the social data size has shown exponential growth. Different data types include text, voice, video, web logs and other forms. The era of big data is coming. In the era of big data, analyzing and processing the existing professional massive data, and making the raw data valuable and available is the key problem and should be solved. Today, Hadoop is an open-source platform which can effectively analyze and process the massive data, with advantages of high reliability, high scalability, high efficiency and high fault tolerance. Therefore, this paper has important practical significance in the study and optimizes the performance of Hadoop platform under the big data environment.First, this paper has a simple introduction of Hadoop platform. We mainly focus on the core components of Hadoop platform: HDFS and Map Reduce, conducting in depth research of the component and operating principle of HDFS and Map Reduce. On the basis of analyzing the HDFS and source code of Map Reduce, the single point failure of(SPOF) Namenode in HDFS has been proposed. Based on the existing solutions, an improved mechanism Avatar was proposed, the improved Avatar mechanism is a program which supporting automatic switching, secondery failure, and no losing data. For the inefficiencies when dealing the performance of join algorithm in Map Reduce, this paper presents a star join algorithm based on a counting Bloom filter. The algorithm can effectively reduce the IO cost of disk, greatly reduce the connection time, and improve the speed of analyzing massive data.Finally, we set up an experimental platform for the improved Avatar mechanisms and star join algorithm based on the counting Bloom filter algorithm to have an experimental verfication. The results show that the improved programs and algorithms can improve the performance of Hadoop on some extent.
Keywords/Search Tags:big data, Hadoop, SPOF, Bloom Filter, star join algorithm
PDF Full Text Request
Related items