Font Size: a A A

Design And Implementation Of Data Import And Preprocessing System

Posted on:2018-03-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y YangFull Text:PDF
GTID:2348330569985791Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of hadoop technology,from the initial Google,Facebook and other companies to solve the massive data storage problems,and now more and more enterprises to deal with large data,enterprises have built a good traditional data warehouse status has been challenged.This paper focuses on how hadoop works with traditional data warehouses,how to carry out transmission,storage,and processing.Based on the traditional data warehouse has been provided on the basis of hadoop support to make up for the traditional data warehouse in the massive data processing,storage and other deficiencies,can also rely on Hadoop's horizontal scalability to break through a single node of the traditional data warehouse in storage and computing power The bottleneck.Based on the application of traditional data warehouse and the prospect of hadoop large data platform,this paper designs the traditional data warehouse and the architecture of data storage and processing based on hadoop hdfs file system for the problem that traditional data warehouse can not meet the needs of users.,While addressing the enterprise user data control permissions requirements.The system is divided into four parts,data management,data preprocessing,system management and release management from data import to data control,data preprocessing and ultimately data publishing and sharing functions.The main function of the system is to collect data and to pre-process the collected data.The system is designed to collect and preprocess various types of data.At the same time,the system can achieve very good extended functions,adding machine learning algorithms Nodes to further dig data processing possible.The system uses the current popular Hadoop infrastructure and migrates and processes data with Hive,a data warehouse in the Haddoop ecosystem,and Sqoop,a data migration tool.To a certain extent,to meet the basic needs of enterprises.System to achieve the Web system,user-friendly,in the realization of the Web system using mature ssm framework for development,to ensure system stability.This system is based on the demand of large-scale enterprise platform,but also takes into account the reuse of traditional data warehouse,the cooperation between the two,and finally realize the system prototype to provide guidance for the practical application of the enterprise.This dissertation discusses the system in detail from the background of the system realization,the system system requirements,the system design,the system realization and the system test.It expounds the significance of the system implementation comprehensively,and has certain practical significan.
Keywords/Search Tags:Hadoop, data warehouse, data preprocessing
PDF Full Text Request
Related items