The Research Of Performance Optimization Of Hadoop In Big Data

Posted on:2016-01-09

Degree:Master

Type:Thesis

Country:China

Candidate:Y Liu

Full Text:PDF

GTID:2308330461991711

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of information technology, the popularization of Internet, internet of things, and various mobile devices, the social data size has shown exponential growth. Different data types include text, voice, video, web logs and other forms. The era of big data is coming. In the era of big data, analyzing and processing the existing professional massive data, and making the raw data valuable and available is the key problem and should be solved. Today, Hadoop is an open-source platform which can effectively analyze and process the massive data, with advantages of high reliability, high scalability, high efficiency and high fault tolerance. Therefore, this paper has important practical significance in the study and optimizes the performance of Hadoop platform under the big data environment.First, this paper has a simple introduction of Hadoop platform. We mainly focus on the core components of Hadoop platform: HDFS and Map Reduce, conducting in depth research of the component and operating principle of HDFS and Map Reduce. On the basis of analyzing the HDFS and source code of Map Reduce, the single point failure of(SPOF) Namenode in HDFS has been proposed. Based on the existing solutions, an improved mechanism Avatar was proposed, the improved Avatar mechanism is a program which supporting automatic switching, secondery failure, and no losing data. For the inefficiencies when dealing the performance of join algorithm in Map Reduce, this paper presents a star join algorithm based on a counting Bloom filter. The algorithm can effectively reduce the IO cost of disk, greatly reduce the connection time, and improve the speed of analyzing massive data.Finally, we set up an experimental platform for the improved Avatar mechanisms and star join algorithm based on the counting Bloom filter algorithm to have an experimental verfication. The results show that the improved programs and algorithms can improve the performance of Hadoop on some extent.

Keywords/Search Tags:

big data, Hadoop, SPOF, Bloom Filter, star join algorithm

PDF Full Text Request

Related items

1	Researches And Applications On Efficient Bloom Filter For Big Data
2	Research On Equi-Join Optimization Algorithms On Spark SQL
3	Research On Top-K Join Algorithm Based On The Star Schema
4	Hadoop Based Efficient Join Algorithm Research On GPU
5	Research And Application Of SQL Join Optimization Based On Spark
6	Research And Application Of Data Deduplication Technology Based On Bloom Filter
7	Research And Implementation Of Multi-Way Join Framework Based On Map-Reduce
8	Research On Query Analysis And Optimization Based On Spark System
9	Efficient Star Join For Column-Oriented Data Store In The MAP Reduce Environment
10	Join Processing And Optimizing On Large Data Sets Based On Hadoop Framework