Font Size: a A A

Research On The Application Of User Behavior Analysis Based On Hadoop

Posted on:2017-04-19Degree:MasterType:Thesis
Country:ChinaCandidate:S S ChenFull Text:PDF
GTID:2308330491451596Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, a large number of user behavior data is generated and stored on the server everyday, so how to dig the user value and the potential benefits out is what Internet business focuses on at present. Massive user behavior data is a great challenge to the traditional data storage and data mining algorithms. In order to solve this problem, here comes Hadoop.This thesis researches on the user behavior analysis based on Hadoop. Based on the research on Hadoop, data mining of web logs and clustering algorithms, this thesis proposes a K-means clustering method based on Canopy method, which overcomes the disadvantages of K-means, including the selection of initial clustering centers, the elimination of outliers and the limitation of data. It also designs a corresponding user behavior analysis system to deal with the Web logs, stored in the HDFS distributed storage system. Using the MapReduce programming model, the traditional Canopy method and K-means method can execute parallel to accomplish the clustering analysis for user behavior data from Web logs.At the end, a single comparative experiment and a cluster speedup ratio experiment are carried out with the actual user query log from Sogou Laboratory. It verifies that the K-means clustering method based on Canopy method always comes out with good results and performance, no matter on a single machine or on the clusters. This means the clustering method proposed in this thesis ensures the efficiency of the algorithm and the accuracy of clustering at the same time. In conclusion, the solution proposed in this thesis is suitable for the distributed architecture with the advantages of high efficiency, wide application.
Keywords/Search Tags:Hadoop, MapReduce, User behavior analysis, Web logs mining, K-means clustering, Canopy clustering
PDF Full Text Request
Related items