Font Size: a A A

Implementation And Application Of E-commerce Data Analysis Platform Based On Hive

Posted on:2022-07-26Degree:MasterType:Thesis
Country:ChinaCandidate:X J HuFull Text:PDF
GTID:2518306554982499Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet and the rapid popularity of smart phones,more and more people switch from offline shopping to online shopping,because online goods are cheaper and more convenient than offline stores,which also gives rise to the rapid development of many e-commerce platforms.For example,Pinduoduo achieved the second place in the domestic e-commerce platform in just a few years with a large number of users sinking into the market.In the context of the fierce competition on the Internet e-commerce platform,how to seize the user traffic,improve user stickiness,and increase their GMV has become a problem that must be considered.In today's Internet era,with the explosive growth of data and various types of data,enterprises must be able to mine the value of data from a large amount of data.In the face of such opportunities and challenges,it is imperative for ecommerce enterprises to build a data platform that integrates mass data collection,storage,calculation and analysis.This paper first analyzes the relevant background of e-commerce big data,and then analyzes the framework and technology in the data analysis platforms applied by some domestic and foreign companies.Then the key technologies used in this platform are introduced and analyzed briefly.Finally,combined with the general needs of most current ecommerce companies,a set of off-line e-commerce data warehouse platform outline design is proposed,and at the same time,it adds some of its own innovations.Then through the requirements and outline design of the specific design and implementation of each module,mainly divided into three modules:data acquisition module,data warehouse and data visualization module.The data collection module is mainly responsible for collecting user behavior data and business data.The user behavior data includes five categories: startup,page,event,exposure and error.The business data includes data related to orders,users,commodities,activities and regions.The data warehouse is mainly used for data cleaning,modeling and analysis,which is divided into 5 layers,which are ODS layer(Operation Data Store),DWD layer(Data Warehouse Detail),DWS layer(Data Warehouse Service),DWT layer(Data Warehouse Topic)and ADS layer(Application Data Store).The data visualization module mainly takes the ADS layer result data as the template,creates the corresponding table in My SQL,uses the SQOOP tool to export the ADS layer result data to My SQL on a regular basis,and uses the data visualization tool to visually display the data.This platform uses Flume,Kafka and SQOOP as data acquisition tools,HDFS as data storage framework,Hive as data warehouse tool,and Spark as Hive computing engine.Based on the data analysis requirements of e-commerce websites in the current big data era,a whole process of e-commerce big data analysis platform has been built.It covers data acquisition,data storage,data analysis,and data presentation.All three modules have passed functional tests and run well,which verifies that the implementation of this platform is consistent with the expected results.If applied in e-commerce enterprises,it can reduce the repeated development of data,improve efficiency,and provide help for enterprise operation decisions through the analysis of various indicators.
Keywords/Search Tags:Hive, Hadoop, Data warehouse, Big data, Data analysis
PDF Full Text Request
Related items