Font Size: a A A

The Cooperative Study Between The Hadoop Big Data Platform And The Traditional Data Warehouse

Posted on:2015-01-22Degree:MasterType:Thesis
Country:ChinaCandidate:S Y FeiFull Text:PDF
GTID:2268330428956313Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of Hadoop technique, it has adopted by more and more companies as the tool in dealing with the big data, it was originally used by Google and Facebook as the tool for the storage of large amount of data; the existing traditional data warehouse of the enterprises are being challenged. This paper has put stress upon the study of the coordination, divisions, data collections, transportations, storage and processing between the traditional data warehouse (without specific instruction, the traditional data warehouse mentioned in this paper refers to the single point relational data warehouse) and the Hadoop technique. The support of Hadoop technique is constructed on the base of the original data warehouse, in this way, the traditional data warehouse’s deficiency in the processing and storage of the big data can be fixed; the bottle neck in the storage and calculation abilities of the traditional single point data warehouse can be solved through the lateral spreading ability of Hadoop.Based on the application conditions of the existing traditional data warehouse and the future forecast of the Hadoop big data platform, this paper proposes the new framework of the cooperation of Hadoop and traditional data warehouse which focus on the cooperation between the traditional data warehouse and the Hadoop technique to solve the problem that the traditional data warehouse can hardly meet customers’ demands. The new framework originated from the thoughts of the designers of Cloudera and Teradata, and in this paper, the new architecture is divided into three modules:data acquisition, data storage and data applications, this paper mainly discusses the consideration of structured and unstructured data collection, storage and application problem, and researches the Hadoop and traditional data warehouse in collaboration of data storage and data application. According to data collection and transmission problem, this paper uses the Apache Sqoop technology as the solution; and relies on Hadoop HDFS file system and the Hive data warehouse to store the data. At the same time, this paper also introduces the data application in the Hive. Finally, the prototype system proves the feasibility of the designed structure.Based on the needs for the big data platform of existing enterprises, this paper also gives concerns about the reuse of the existing traditional big data warehouse and studies the cooperative relations between the two sides. Finally, the Demo system realized in this paper provides guidance for the enterprises to realize.
Keywords/Search Tags:Hadoop, Data warehouse, Big data, Relational database
PDF Full Text Request
Related items