Design And Implementation Of Big Data Processing Platform Based On Hadoop

Posted on:2018-05-18

Degree:Master

Type:Thesis

Country:China

Candidate:T F He

Full Text:PDF

GTID:2348330569485791

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of science and technology,more and more data are produced,which makes large data processing technology become one of the most popular technology research in recent years.However,in the practical application,the popularization rate of large data processing technology is far less than the speed of data generated,which makes many companies face data cannot be processed in a timely manner,and therefore cannot dig out the value of the data.How to realize the efficient processing of large data sets is the main content of this paper.The processing process includes data extract,data transform and data load,which is called ETL process.The content of this paper is to build a large data processing platform through Hadoop large data storage architecture,Hive,flume data acquisition technology and Sqoop data synchronization technology to achieve efficient processing of large data sets.Hadoop is the most popular framework for large data processing at at the moment.Hdoop has such advantages as high reliability,high scalability,high efficiency and low cost.The Hadoop implementation of the MapReduce computing framework is an efficient parallel framework.Hadoop users must write specific MapReduce program to deal with tasks,but Hadoop exposed bottom interface,even a simple task users also need to write a lot of code,it is hard to reuse a the code.The emergence of Hive largely solves this problem,Hive is an open source data warehouse tools that is based on Hadoop,and it supports a kind of SQL like language,Hive can compile HQL into a MapReduce program,so that Hive can use Hadoop efficient parallel processing ability.As a result,Hive users are able to write a small amount of code for rapid development.Therefore,this paper chooses Hive as the tool of data cleaning and processing.Based on the in-depth research of these big data technologies,especially Hadoop and Hive,a big data processing platform based on Hadoop is developed in this paper.In data ETL process,the data conversion process takes the longest time.Therefore,in this paper,we focus on the optimization principle and method of Hive QL and optimize the Hive QL for actual business data processing through this research.

Keywords/Search Tags:

Big Data Processing, Hive, Hadoop, ETL, Optimization

PDF Full Text Request

Related items

1	Design And Implementation Of Big Data Processing Platform Based On Hadoop
2	Optimization Of Data Processing Scheme On Financial Big Data System
3	The Design And Implementation Of Network Authentication System Based On Hadoop/hive
4	Compatible Study Of Hadoop For Efficient Analyzing And Processing Of Big Data
5	Design And Implementation Of Massive Web Log Analysis System Based On Hadoop/Hive
6	Research On Hadoop-based MeteCloud Resource Storage And Data Processing
7	Realization And Optimization Of Dairy Traceability System Based On Hadoop/Hive
8	Design And Implementation Of Hive-based Purchase And Sale Data Warehouse System
9	Implementation And Application Of E-commerce Data Analysis Platform Based On Hive
10	The Help Of Book Lessons For Early Education Eased On Hadoop