Font Size: a A A

Design And Implementation Of E-commerce Multidimensional Analysis System Based On Hive

Posted on:2021-02-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y JiaFull Text:PDF
GTID:2428330623967319Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
In recent years,China's big data has made remarkable progress in many aspects,such as policy,technology,industry and application,and the scale of China's digital economy has increased significantly.At the same time,the electricity supplier has also grasped the resources superiority that other industries and enterprises can not compare with,and has the most accurate and comprehensive user data.Therefore,the application of massive data to the operation and business support of enterprises will be an important means for future electricity providers to transform data services and seize market initiative,and the heterogeneity of data sources and PB will be the same.However,the application transformation of historical data in the order of TB brings great challenges to the multi-dimensional analysis using traditional EDW(Enterprise Data Warehouse).In order to solve the high cost of operation and maintenance of traditional data warehouse based on large-scale server and the limitation of iterating Internet products relying on personnel experience,this thesis designs and develops a Hive-based multi-dimensional business analysis system,which drives decision-making and intelligent operation with data,and improves the reusability of data resources.Based on the architecture of large data platform based on CDH,and on this platform,a Hive-based four-tier data warehouse is implemented.By building a data warehouse hierarchically,the data governance capability is greatly improved,the data quality is guaranteed,the traditional crontab job scheduling through Linux is optimized,the Azkaban job scheduling is integrated,the data operation is monitored comprehensively,and the related indicators such as the activity of e-commerce usersare realized.The main work of this paper is as follows:1.This thesis studies the related technologies of the existing large data platform,builds an enterprise-level large data platform based on CDH,and designs and implements a Hive data warehouse based on CDH platform.2.A new type of self-research component,Pipline acquisition module based on Kafka,is proposed and designed,which solves the loading and storage problem of heterogeneous data sources,guarantees the consistency of large data in large-scale data migration,and controls the quality level of data very well.3.Based on Hive,a four-tier data model of warehouse is proposed and designed,which realizes data management of different granularity among different levels of warehouse,speeds up the process of query and data calculation,and realizes data visualization with SSM framework.4.Integrating Azkaban job scheduling system to solve the difficulties of traditional manual report writing,maintenance and upgrade in warehouse,design and implement automatic job scheduling in data warehouse and complete system testing.
Keywords/Search Tags:Data Warehouse, Big data analysis platform, Hive, ETL, CDH
PDF Full Text Request
Related items