Design And Implementation Of Data Import And Preprocessing System

Posted on:2018-03-18

Degree:Master

Type:Thesis

Country:China

Candidate:Y Yang

Full Text:PDF

GTID:2348330569985791

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the development of hadoop technology,from the initial Google,Facebook and other companies to solve the massive data storage problems,and now more and more enterprises to deal with large data,enterprises have built a good traditional data warehouse status has been challenged.This paper focuses on how hadoop works with traditional data warehouses,how to carry out transmission,storage,and processing.Based on the traditional data warehouse has been provided on the basis of hadoop support to make up for the traditional data warehouse in the massive data processing,storage and other deficiencies,can also rely on Hadoop's horizontal scalability to break through a single node of the traditional data warehouse in storage and computing power The bottleneck.Based on the application of traditional data warehouse and the prospect of hadoop large data platform,this paper designs the traditional data warehouse and the architecture of data storage and processing based on hadoop hdfs file system for the problem that traditional data warehouse can not meet the needs of users.,While addressing the enterprise user data control permissions requirements.The system is divided into four parts,data management,data preprocessing,system management and release management from data import to data control,data preprocessing and ultimately data publishing and sharing functions.The main function of the system is to collect data and to pre-process the collected data.The system is designed to collect and preprocess various types of data.At the same time,the system can achieve very good extended functions,adding machine learning algorithms Nodes to further dig data processing possible.The system uses the current popular Hadoop infrastructure and migrates and processes data with Hive,a data warehouse in the Haddoop ecosystem,and Sqoop,a data migration tool.To a certain extent,to meet the basic needs of enterprises.System to achieve the Web system,user-friendly,in the realization of the Web system using mature ssm framework for development,to ensure system stability.This system is based on the demand of large-scale enterprise platform,but also takes into account the reuse of traditional data warehouse,the cooperation between the two,and finally realize the system prototype to provide guidance for the practical application of the enterprise.This dissertation discusses the system in detail from the background of the system realization,the system system requirements,the system design,the system realization and the system test.It expounds the significance of the system implementation comprehensively,and has certain practical significan.

Keywords/Search Tags:

Hadoop, data warehouse, data preprocessing

PDF Full Text Request

Related items

1	The Cooperative Study Between The Hadoop Big Data Platform And The Traditional Data Warehouse
2	Research On The Collaboration Of Hadoop Data Platform And Data Warehouse
3	The Research And Application Of Data Preprocessing In XML Data Warehouse
4	Data Quality Control: Research, Design, And Implementation In Data Preprocessing
5	Research On The Key Technology Of Data Warehouse Based On Hadoop Platform
6	The Reaserch And Implementation Of Key Technologies Of Big Data Preprocessing Based On Hadoop Platform
7	Based On Hadoop Electric Offline Patterns Of Data Mining System Design And Implementation
8	The Design And Implementation Of Voice Guide Data Warehouse Based On Big Data
9	Research And Design Of Data Warehouse Model For Mine Geological Data
10	Research On Key Technology In Preprocessing Oriented Web Text Data Warehouse