Font Size: a A A

The Design And Implementation Of Online Retailers Data Analysis System Based On Hadoop

Posted on:2018-09-12Degree:MasterType:Thesis
Country:ChinaCandidate:M D SunFull Text:PDF
GTID:2348330542958181Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of computer technology,network technology and Internet technology,as well as the automation of business processes in all walks of life,the data generated by the application of the industry explosive growth,these data are frequently calculated in TB.These data and the information produced accurately record the operation of the enterprise.In the era of information explosion,traditional tools have many defects in technology,which can not effectively deal with the increasing data.The diversity of data,unified data format in data fusion is difficult,single store reserves are limited,and the query performance is low,while single data analysis has limitations on,it can only handle simple and small scale data,the running speed of the algorithm,it is difficult to deep data mining.Therefore,people continue to explore new tools to analyze the operating rules of enterprises,and to provide valuable information for decision makers.Along with the information technology represented by the Internet depth development,storage,management and analysis capabilities of traditional software and hardware of the machine is not enough to support such a large amount of data,which is dedicated to distributed large data processing technology came into being.Today,the industry's major data processing platform is not the mainstream Hadoop.Hadoop since its introduction,because of its wide range of practical applications in the field of large data,it has been widely used in the industry and commerce,and has also been widely studied by the academic community.In a few short years,Hadoop soon become by far the most mainstream technology of big data success,the most widely used processing system and platform,and become a big data de facto standard,get a lot of different industries to further explore and research applications,especially widely used in the Internet industry.Due to the limitation of traditional data analysis on single-machine system,it can significantly affect system performance when processing large amount of data.So in order to solve this problem,based on the in-depth analysis of the relevant technology of the Hadoop big data platform,puts forward the electricity business data analysis system scheme based on Hadoop,help enterprises to use effective data analysis method to make better business decisions.The scheme using the Flume to collect user in electricity network operators on the massive user behavior data,and stored in the hadoop distributed file system,with graphs computing framework for data processing method,using the Hive from different dimensions to statistical analysis of data,at last this paper puts forward an improved K means clustering and mixed collaborative filtering recommendation algorithm for user recommendation.Based on the requirement analysis of system architecture and business process has carried on the detailed design,the system is divided into four modules:data collection module,data analysis module,data display module and application module.The four modules are designed and implemented in detail.Finally,based on this system,the paper analyzes the log file analysis and commodity comment analysis of an e-commerce company,and then carries on the user recommendation test on this basis.Through data obtained from the analysis results,we can help the company to the site have a better understanding of the application of,and more detailed understanding of the user's behavior,so as to find out website,marketing channels,such as the problems existing in the marketing environment,help marketing precision,improve the benefit of the company.This paper introduces the analysis and mining objectives and processes of commodity reviews,makes a visual analysis of commodity comment data,and puts forward the method of word segmentation and grading to comment on the emotional analysis of data.And proposed an improved K-means clustering,and collaborative filtering recommendation algorithm combining Hadoop big data technology and the algorithm,applied to the practice,to solve the electricity enterprise users data analysis and recommendation and other business needs.
Keywords/Search Tags:hadoop, mapReduce, hive, data analysis, Emotional analysis, collaborative filtering
PDF Full Text Request
Related items