Font Size: a A A

Optimization Of Data Processing Scheme On Financial Big Data System

Posted on:2016-11-20Degree:MasterType:Thesis
Country:ChinaCandidate:L TangFull Text:PDF
GTID:2298330467491912Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the popularity of information technology, a big amount of data is generated from different trades, which brings new challenge to data processing area. Big data had received a lot of attention from all fields. In financial area, data processing plays a very important role in IT system. An effective data processing scheme for financial data is imperative to meet the requirement of data with ever increasing volume. Technology on data processing is also well developed with increasing volume of data. The-release of Hadoop had led a new fashion of data processing due to its open-sourceness, scalability, effectiveness in data processing. Hadoop had been widely used in the industry.This paper focuses on data processing scheme in financial area and introduces a specialized data processing system based on Hadoop.This system can meet the requirement of fast loading, managing and query. All of our works are based on the project from a financial company; whose system of the company is based on Hadoop. Our work solve the following problem:data is increasing every day with the development of financial business and traditional technology can no longer handle such large scale data; original Hadoop is appliable for this company, but there are still many problems need to be solved, such as low utilization of computing resource.In this dissertation, we introduce an effective way to optimize data processing. The dissertation is organized as follows:first, we introduce the key knowledge of Hadoop including HDFS, MapReduce and others. We also introduce the architecture, HiveQL, fileformat and UDF of Hive in detail as they are the important parts of our system.Then, we take a deep analysis of the original system. In order to confirm the user habbit in daily business we use HiveQL and Linux to analyze logs in Hadoop. We analyze the schema and field of every physical table to find the problems of storage and query performances. We also take an analysis on architecture to get performance problems of query in our system.After we confirm the problems, we propose the optimization scheme in terms of logical structure, physical structure and. system architecture. For logical structure optimization, we introduce a new physical table design by combine tables and views. A new file format named morcfile is proposed to solve the storage problems. As for system architecture, we design a new architecture to meet the fast data processing need.Finally, we introduce the implementation steps of every optimization parts and realize our scheme in a testing system and verify its viability and effectiveness.
Keywords/Search Tags:Hadoop, data processing, Hive, data storageoptimization, Morcfile
PDF Full Text Request
Related items