Optimization Of Data Processing Scheme On Financial Big Data System

Posted on:2016-11-20

Degree:Master

Type:Thesis

Country:China

Candidate:L Tang

Full Text:PDF

GTID:2298330467491912

Subject:Electronics and Communications Engineering

Abstract/Summary:

PDF Full Text Request

With the popularity of information technology, a big amount of data is generated from different trades, which brings new challenge to data processing area. Big data had received a lot of attention from all fields. In financial area, data processing plays a very important role in IT system. An effective data processing scheme for financial data is imperative to meet the requirement of data with ever increasing volume. Technology on data processing is also well developed with increasing volume of data. The-release of Hadoop had led a new fashion of data processing due to its open-sourceness, scalability, effectiveness in data processing. Hadoop had been widely used in the industry.This paper focuses on data processing scheme in financial area and introduces a specialized data processing system based on Hadoop.This system can meet the requirement of fast loading, managing and query. All of our works are based on the project from a financial company; whose system of the company is based on Hadoop. Our work solve the following problem:data is increasing every day with the development of financial business and traditional technology can no longer handle such large scale data; original Hadoop is appliable for this company, but there are still many problems need to be solved, such as low utilization of computing resource.In this dissertation, we introduce an effective way to optimize data processing. The dissertation is organized as follows:first, we introduce the key knowledge of Hadoop including HDFS, MapReduce and others. We also introduce the architecture, HiveQL, fileformat and UDF of Hive in detail as they are the important parts of our system.Then, we take a deep analysis of the original system. In order to confirm the user habbit in daily business we use HiveQL and Linux to analyze logs in Hadoop. We analyze the schema and field of every physical table to find the problems of storage and query performances. We also take an analysis on architecture to get performance problems of query in our system.After we confirm the problems, we propose the optimization scheme in terms of logical structure, physical structure and. system architecture. For logical structure optimization, we introduce a new physical table design by combine tables and views. A new file format named morcfile is proposed to solve the storage problems. As for system architecture, we design a new architecture to meet the fast data processing need.Finally, we introduce the implementation steps of every optimization parts and realize our scheme in a testing system and verify its viability and effectiveness.

Keywords/Search Tags:

Hadoop, data processing, Hive, data storageoptimization, Morcfile

PDF Full Text Request

Related items

1	Performance Optimization Of A Massive Data Query And Analysis System On Hadoop
2	Design And Implementation Of Big Data Processing Platform Based On Hadoop
3	Compatible Study Of Hadoop For Efficient Analyzing And Processing Of Big Data
4	The Design And Implementation Of Network Authentication System Based On Hadoop/hive
5	Research On Hadoop-based MeteCloud Resource Storage And Data Processing
6	Implementation And Application Of E-commerce Data Analysis Platform Based On Hive
7	Design And Implementation Of Hive-based Purchase And Sale Data Warehouse System
8	Based On Hadoop Electric Offline Patterns Of Data Mining System Design And Implementation
9	The Research And Practice Of Performance Optimization Based On Hive
10	Research And Application Of Big Data Migration And Query Based-on Hadoop Platform