Font Size: a A A

Design And Implementation Of Data Processing Platform Based On Hadoop

Posted on:2016-07-09Degree:MasterType:Thesis
Country:ChinaCandidate:X Z ChenFull Text:PDF
GTID:2308330482951587Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the arrival of big data, distributed platforms for processing big data becomes more important and glitters under the spotlight. Hadoop is a famous data processing system which has been widely used in industry and academia. A Hadoop job is designed to run a MapReduce program, handling the transportation, computation, and transformation of large volumes of data. In every second, many Hadoop jobs run concurrently in a data center. The management of concurrent Hadoop jobs becomes more and more difficult. Developing a Hadoop-based data management platform is thus really necessary.This thesis analyzes the Hadoop Distributed File System (HDFS) and the MapReduce framework, and describes the design and implementation of platform. Main contributions of this thesis include:1. The design and implementation of a Hadoop-based data management platform. This platform considers the universality, scalability, security and high-efficiency as the basic design elements. This thesis designed and implemented the logic function modules, database structures, and user interfaces.2. The design and implementation of a unified scheduling of Hadoop jobs. Based on the HDFS file system, the business data can be stored in a distributedmanner. Through encapsulating the distributed processing program using the MapReduce framework, the scheduling of the on-top Hadoop jobs become unified.3. The design and implementation of the management of the operation flows of business processes. The workflow of processing image data which includes batch creation, data preparation, batch processing, batch checking, and batch storage are all carefully designed.4. The implementation of the platform’s account management system which manages all users and accounts. The permissions of different modules are isolated which make the data on the platform more secure.The Hadoop-based data management platform in this thesis has been applied in an Internet enterprise. It supports the distributed storage of image business data, image stitching, the blur of image privacies, and good process management. Since online for several months, the platform is very stable, reduces the labor costs, and expectation satisfies our expectation.
Keywords/Search Tags:Hadoop, HDFS, MapReduce, Nginx, FastCGI, MFC
PDF Full Text Request
Related items