Analysis And Research On User Online Shopping Behavior Based On Hadoop

Posted on:2022-10-15

Degree:Master

Type:Thesis

Country:China

Candidate:J X Zhai

Full Text:PDF

GTID:2518306566977539

Subject:Master of Engineering

Abstract/Summary:

PDF Full Text Request

Now that the Internet has entered the era of rapid development,a large amount of user data is transmitted and stored on the server at all times,how to mine and use this user data then convert it into benefits has become the focus of research in various fields.Traditional data storage and clustering algorithms have been unable to meet people’s needs when faced with massive amounts of data,and the rise of Hadoop has provided a new direction for clustering algorithms to process massive amounts of data.Hadoop is a system framework that contains multiple components.It can perform distributed processing of data.Its core components are the Distributed File System(HDFS)and Map Reduce.HDFS locates and stores files through the joint work of nodes in the cluster,while the Map Reduce programming framework is based on user definitions and its own components to implement data parallel computing functions.K-means algorithm is a classic clustering analysis algorithm,which is widely used,however,the selection of the initial center point of the algorithm is often artificially selected,which makes the results appear large deviations.At the same time,when faced with massive data,the efficiency of the algorithm is significantly reduced.The main work of this paper is as follows: firstly,aiming at the shortcomings of the algorithm,two improved schemes based on density weight and combined with canopy algorithm are proposed.Then,the algorithm is combined with Map Reduce to realize the parallelization of the algorithm.At the same time,the parallelization process of the algorithm is optimized,and a half area calculation method is proposed.In order to verify the actual efficiency of the optimized algorithm,this paper uses the user data of Jing Dong to compare the performance of the traditional K-means algorithm and the optimized k-means algorithm through the Hadoop platform built on the virtual machine.In the distributed cluster environment,a user online shopping behavior analysis system is designed and implemented.The improved algorithm is used to process the user’s behavior data offline,and the results are displayed through the visual page.

Keywords/Search Tags:

K-means, Canopy, bigdata, clustering algorithm, user behavior

PDF Full Text Request

Related items

1	User Behavior Analysis In Software Version Management System
2	Research On The Application Of User Behavior Analysis Based On Hadoop
3	High Dimensional Fuzzy C-Means Clustering Recommendation Algorithm Based On Density Canopy
4	Research On Hot Topics Discovery In Microblog Based On Distributed K-means Algorithms
5	Research Of K-means Clustering Algorithm Based On MapReduce
6	Research On Document Clustering Algorithm Based On K-means
7	Campus Network User Behavior Analysis And Research
8	The Research Of Clustering Algorithm Based On Hadoop Cloud Computing Platform
9	The Optimization Of Parallelized K-means Based On Mahout
10	Research And Implement Of A Method For Tracing User Behavior Based On Improved K-means Algorithm