Font Size: a A A

Design And Implementation Of Hive-based Purchase And Sale Data Warehouse System

Posted on:2021-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y N LiFull Text:PDF
GTID:2518306461969129Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
The development of science and technology has brought people change the way of office life,The development of artificial intelligence,big data and other technologies advocated in the current era all take data as the cornerstone.The value of data is increasingly evident in every aspect of life,and for an enterprise,the value of data plays a crucial role.enterprise's daily management and production,accumulated the massive amounts of data in the data into the DT era,how to properly handle and use of these data has become an important concern to all enterprises.Big data technology breeds born under this background,with the development of technology,big data technology is also developing steadily in the trend,in the wave of big data,Hadoop,has been widely recognized by academia and industry,open source data warehouse application above the Hadoop cluster Hive with schema free,high scalability and fault tolerance is high,can very good service for the construction of enterprise data warehouse,as a result,more and more enterprises begin to consider how to make good use Hive advantage of the data warehouse,data warehouse to create their own enterprises,So that their information construction to a new level.Data warehouse is a theme-oriented,processed and integrated data set that is relatively stable and changes over time.However,the traditional data warehouse has low scalability,fault tolerance and other characteristics,and the effect of processing large-scale data is very poor,making the data warehouse completely out of the trend of The times and unable to play its due function.In the data warehouse,query is an important part of all its operations.The data in the data warehouse will exist for a long time,which is convenient for users to directly conduct query operations.Under the background of big data,new data warehouse construction scheme is extremely urgent.In response to the problems raised above,this article puts forward a new data warehouse construction idea in the era of big data,which can better serve the information construction of enterprises under the current social background.Enterprise data warehouses provide data support for enterprises and facilitate enterprises to better conduct data management and data mining.It is the core of enterprise information construction.The article starts with the background and significance of data warehouse construction,and then deeply analyzes the current research results of data warehouse technology at home and abroad.Based on the research of existing technology,this article uses automated data processing technology and novel Data hierarchical thinking,and demand analysis and design of the target system learned that data processing,data modeling,data warehouse management and visualization are the four important components of our target system.Therefore,the main research contents of this article are: 1.Data processingUsing the Hadoop platform,Hadoop provides an efficient and cheap data processing platform for the data warehouse,using Hive sql to facilitate the data processing process,and independently designing and using Shell scripts to realize the ETL automation process,and obtaining the optimal number of threads through experimental analysis,using Yarn to carry out resource management to improve the stability and scalability of the data warehouse,useing the advantages of Map Reduce parallel computing can well support the large-scale data processing of enterprises.2.Data modelingThe core of data modeling is layer by layer decoupling.The closer you get to the bottom,the closer you get to the record of the business happening,and the closer you get to the top,the closer you get to the business goals.In the construction of data warehouse,we use the dimensional modeling method,and stratify the data while modeling,so that we can make the data more reliable and the data structure more clear,which is convenient for us to carry out blood tracing on the data,and provides great convenience for data development.3.VisualizationUse spring MVC + spring + mybatis + Echars as the framework of the entire system to show the key data that they care about for enterprises to make decisions.4.Test partCorresponding test cases are designed for each functional module.According to the test cases we designed to test the performance,function and safety of the system,the test results are within our expectations,the system is operating normally.
Keywords/Search Tags:Data Warehouse, Hadoop cluster, Hive, MapReduce, Data visualization
PDF Full Text Request
Related items