Font Size: a A A

Research On The Key Technology Of Data Warehouse Based On Hadoop Platform

Posted on:2018-12-03Degree:MasterType:Thesis
Country:ChinaCandidate:Z WuFull Text:PDF
GTID:2348330515457445Subject:Engineering
Abstract/Summary:PDF Full Text Request
The Data Warehouse can help enterprises make decisions quickly and correctly,it provides an effective method to access these data.Statistics show that the enterprise data will be double every 18 months,there are many problems will be brought if they only use the traditional databases with Oracle and MySQL build the Data Warehouse,such as the performance of the databases,the hosts,the network,the resources and other problems.The giant companies of the Internet,such as Google,Yahoo,Amazon and Microsoft and so on,have large amounts of data.The exponential data bring out many problems,so they have to discover the new technologies to analyze TB and PB level mass data to achieve useful information.The message is useful to those companies to find the popular books and music,and recommend the popular news and books to the potential customers.These make the distributed processing system develop quickly,such as Hadoop.The Hive is widely used to build data warehouse,to solve the traditional data warehouse which was build with the relational database.But,the existing tools are becoming unable to handle such large data sets.Google,the first company who provide the MapReduce programming model and the model is able to process the data of PB levels in inexpensive computer clusters in parallelly.The Data Warehouse is in a stage of rapid development,with the exponential growth data of the enterprise and the Hadoop turns for the better than before.With the growth of enterprise data,especially in the information age,we use the mobile phones,PC and the IOT(Internet of Things),the enterprise will need maintain the explosive growth of data to analyze with the Data Warehouse.Now,the Data Warehouse is not enough to support the huge amounts of the storage,treatment and analysis.In this paper,we come up with a method to solve the traditional Data Warehouse's problems which bring out bye the relational database.The Data Warehouse platform with the Hadoop can fulfill the needs of the enterprises,it can use the computing performance of the traditional database and Hadoop platform's ability to handle huge amounts of data.The hive which belongs the Hadoop can be used to build the data warehouse of the enterprise,it is widely used in the Internet companies to a great extent.The Data Warehouse which was built with hive has the great development prospects and values.In this paper,we build a heterogeneous data warehouse with Hadoop,on this basis,we study the model of data warehouse,design a hybrid data warehouse architecture,data warehouse ETL technology,explore the fields of the data synchronization on the heterogeneous data platform,at last,we did a lot of work in the fields of the Data Warehouse with the tool of the Kettle which is an open source project,such as the extraction,transformation,and loading.
Keywords/Search Tags:Data Warehouse, Hadoop, heterogeneous data, ETL
PDF Full Text Request
Related items