Design And Implementation Of Data Processing And Analysis System Based On Spark

Posted on:2016-11-07

Degree:Master

Type:Thesis

Country:China

Candidate:S Li

Full Text:PDF

GTID:2308330470455538

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Application systems expansion magically with the rapid growth of computer and information technology, at the same time, the quality of data that come from the application systems increase explosively. It is necessary and urgent to find effective technology and rules right now. In modern world, Hadoop and Spark are the most popular distributed computing framework. For the purpose of meet their business requirement and improve quality of their products, increasing number of companies and institutions began to learn and use these two technologies which have gradually matured. In this context, a company puts forward to build a data processing analysis system based on Spark, namely ATL-DPAS(data processing and analysis system of acn technology lab). This system could not only compatible with Hadoop cluster, but also flexibly process the numerable data, take real-time remote inquiry and visual analysis according to the existing computing resources.According to different functional requirements, this system is divided into three modules which include data processing module, data query module and data modeling module. I was primarily responsible for the design and development of data processing module, including the design of flow, the design and implementation of data cleaning or merging algorithm, and to implement a variety of data processing interface. In this paper, firstly the research status of ATL data processing and analysis system is reviewed, then this paper introduces the related theory and technology. Secondly, in order to tease out aims of this system, this paper describes the system requirements analysis, including functional requirements and non-functional requirements. Then introduces system architecture design and database design. Thirdly, accomplishs flow design, code realization and interface display for data processing module which includes HDFS list, data adding, data cleaning, data combining and data type management. Finally, introduces the process of system deployment and the way of system test. Then describes detailedly function test and performance test of data processing module, and evaluates the test results. So as to confirm the excellent performance of Spark, also proves the validity and practicability of this paper.At present, this system is in the stage of trial operation, there are hundreds of G level data is processed and analyzed every day. The operation results show that the system can run with excellent performances and stable operation, and it achieves the desired objectives.

Keywords/Search Tags:

Big data, Data processing, Spark, HDFS, Hive

PDF Full Text Request

Related items

1	Design And Implementation Of Agricultural Product E-commerce Data Warehouse Analysis And Evaluation System Based On Hive On Spark
2	Design And Implementation Of NetEase Mobile Big Data Support Platform Based On Spark And Hive
3	The Research And Practice Of Performance Optimization Based On Hive
4	Query Optimization In Spark SQL For Business Data Of 4G Industry Card Based On HDFS
5	App Lication Of Spark-based Real-time Efficient Processing Algorithm In Internet User Behavior Analysis Platform
6	Design And Implementation Of Insurance Data Warehouse System Based On Hive
7	Design And Optimization Of Big Data Analysis Platform Based On Spark And HDFS
8	Application Research Of Real-time Data Analysis Based On Spark Computing
9	Method And Implementation For Hive-Based Offline Data Processing
10	Design And Implementation Of Telecom 4G Big Data Platform For Network Optimization Based On Spark