Font Size: a A A

Research And Implementation Of User Behavior Analysis System Based On Spark

Posted on:2020-11-23Degree:MasterType:Thesis
Country:ChinaCandidate:S XiaoFull Text:PDF
GTID:2428330572981325Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of Internet technology and the rapid growth of the number of network users,a large number of network user data have been generated,which brings opportunities and challenges to Internet enterprises.On the one hand,the analysis of these data can help enterprises understand users better and make correct decisions in time,thus bringing great value to enterprises.On the other hand,large-scale user behavior data is a huge technical challenge for Internet enterprises in terms of storage,calculation,analysis and application.Especially in the face of massive real-time behavior data generated in a short time,it has always been a difficult point in the big data industry.The open source community has launched many large data analysis platforms that deserve attention,such as the well-known distributed system infrastructure Hadoop developed by Apache.Its two core components,HDFS distributed file system and MapReduce programming model,provide the storage and calculation function of massive data.Because of its high fault tolerance,high reliability and low cost,it has been widely used in enterprises.However,using Hadoop to process massive data,users must develop MapReduce program by themselves.MapReduce is criticized for its difficulty in programming,and the biggest disadvantage of MapReduce is that it does not meet the needs of real-time applications.Spark,an open source parallel framework of Hadoop-like MapReduce,provided by AMP Laboratory of University of California,Berkeley,has not only the function of off-line data processing of MapReduce,but also the ability of real-time massive data processing.It is easier to program than MapReduce,and faster to process the same amount of data.Based on the current situation of e-commerce enterprises in the era of big data,this paper uses the most popular Spark technology framework in the field of big data and related components to analyze and process a large number of user behavior data generated by commercial websites.The main work and innovations of this paper include the following aspects:In view of the predicament faced by enterprises in the environment of large data volume,this paper investigates the related technologies that can be used in the user behavior analysis system,analyses the shortcomings of Hadoop technology in large data processing,and uses Spark technology to build the system.A user behavior analysis system based on Spark is designed and implemented.The design idea of each module is elaborated in detail.The system is programmed and implemented in the distributed cluster environment.The function modules including offline processing of user behavior data and real-time online processing of user data are completed.Finally,the technology of data visualization is combined.The analysis results are displayed on the page.The user behavior analysis system designed and implemented facilitates enterprise managers to find possible problems in marketing and operation of products,makes the marketing of enterprises more accurate and effective,and solves the business pain points and technical difficulties encountered in the development of e-commerce enterprises.
Keywords/Search Tags:Network User Behavior Analysis, Flow computation, Off-line analysis, Spark
PDF Full Text Request
Related items