Font Size: a A A

Design And Implementation Of A Hadoop-based Data Analysis System For E-commerce

Posted on:2017-09-26Degree:MasterType:Thesis
Country:ChinaCandidate:X Y WanFull Text:PDF
GTID:2359330515959788Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the mass increment of visit traffic and trading volume,type of service and workflow are becoming increasingly complicated,existing business intelligence and analytics sofftware can't process vast amounts of data efficiently or fulfill all the requirements of a particular enterprise.We designed and implemented a one-stop Hadoop-based data analysis system for e-commerce to confront the challenge.The Apache Hadoop project develops open-source software for reliable,scalable,distributed computing.The project includes a distributed file system called HDFS and a YARN-based system for parallel processing of large data sets called MapReduce,allows for the distributed processing regardless of hardware.According to the workflow,our system consists of four main modules:ETL(Extract,Transform and Load),data modeling,OLAP(Online analytical processing)and data visualization.We extract data from RDB(Relational database),transform and clean the data,and then store it in the data warehouse.According to the customized requirements,we build logic modeling of the tables in the warehouse and design connection scheme of tables.And then,we use Apache Hive to analysis the data,encapsulate the wanted tables,load the tables into the visualization software,and then provide decision support for business stakeholders.
Keywords/Search Tags:E-commerce, Business Intelligence, Hadoop, Hive, Data Analysis
PDF Full Text Request
Related items