Font Size: a A A

Analysis And Research On User Online Shopping Behavior Based On Hadoop

Posted on:2022-10-15Degree:MasterType:Thesis
Country:ChinaCandidate:J X ZhaiFull Text:PDF
GTID:2518306566977539Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Now that the Internet has entered the era of rapid development,a large amount of user data is transmitted and stored on the server at all times,how to mine and use this user data then convert it into benefits has become the focus of research in various fields.Traditional data storage and clustering algorithms have been unable to meet people’s needs when faced with massive amounts of data,and the rise of Hadoop has provided a new direction for clustering algorithms to process massive amounts of data.Hadoop is a system framework that contains multiple components.It can perform distributed processing of data.Its core components are the Distributed File System(HDFS)and Map Reduce.HDFS locates and stores files through the joint work of nodes in the cluster,while the Map Reduce programming framework is based on user definitions and its own components to implement data parallel computing functions.K-means algorithm is a classic clustering analysis algorithm,which is widely used,however,the selection of the initial center point of the algorithm is often artificially selected,which makes the results appear large deviations.At the same time,when faced with massive data,the efficiency of the algorithm is significantly reduced.The main work of this paper is as follows: firstly,aiming at the shortcomings of the algorithm,two improved schemes based on density weight and combined with canopy algorithm are proposed.Then,the algorithm is combined with Map Reduce to realize the parallelization of the algorithm.At the same time,the parallelization process of the algorithm is optimized,and a half area calculation method is proposed.In order to verify the actual efficiency of the optimized algorithm,this paper uses the user data of Jing Dong to compare the performance of the traditional K-means algorithm and the optimized k-means algorithm through the Hadoop platform built on the virtual machine.In the distributed cluster environment,a user online shopping behavior analysis system is designed and implemented.The improved algorithm is used to process the user’s behavior data offline,and the results are displayed through the visual page.
Keywords/Search Tags:K-means, Canopy, bigdata, clustering algorithm, user behavior
PDF Full Text Request
Related items