Font Size: a A A

The Research And Implementation Of Data Warehouse For Logistics Based On Hive

Posted on:2017-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:W WangFull Text:PDF
GTID:2308330503953781Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years, with the development and application of big data technology, Hadoop has been widely recognized by academia and industry. Hive, as the open source data warehouse based on Hadoop cluster, has the characteristics of free mode, high scalability and high fault tolerance and can also meet the needs of enterprise data warehouse well. Therefore, more and more logistics enterprises begin to consider how to use the advantages of Hive data warehouse to improve their information construction.The paper designed the overall architecture of the logistics data warehouse based on Hive, chosed the development language and analyzed the implementation method. All based on the background of intelligent logistics big data platform project of a logistics information System Software Ltd(hereinafter referred to as DK) and full study of the logistics company’s business needs.For the extendibility of traditional logistics enterprise data warehouse is not good, the degree of automation in operation is not high and the effectiveness of data processing is poor, this paper puts forward the concrete implementation scheme of logistics data warehouse which combines with the virtual technology of cloud platform in Colleges and is able to provide high scalability, in the analysis and design of logistics data warehouse based on Hive. In addition, data Transformation Loading Extraction process and data query analysis process of data warehouse can meet the needs of the whole automation without any manual intervention. Taking the advantages of MapReduce parallel computing can also support large-scale logistics data processing well.First, this paper introduced the domestic and foreign present situation and related Hadoop technologies, compared the Hive data warehouse, relational database and the traditional data warehouse, studied the advantages and disadvantages of Hive data warehouse and putted forward the application scenarios. Secondly, it toke the DK logistics data platform project as the background, analyzed the demands, designed the architecture, and analyzed the implementation method of the logistics data warehouse based on Hive. Thirdly, a large data processing platform based on virtualization is built. The Hive and Sqoop environment are deployed based on the school cloud platform. The logistics data warehouse based on Hive was consisted of the data ETL and data query analysis process, including the research on scalability of data warehouse, automatic multi thread ETL script programming and the best thread number research, Hive data storage analysis, Hive data pre processing, query analysis processing, post processing script. Finally, through the operation results of Hive data warehouse, the value of the project was evaluated, and it proves that the system can support enterprise management policymaking from different angles.
Keywords/Search Tags:intelligent logistics big data platform, Hive data warehouse, ETL, query analysis
PDF Full Text Request
Related items