Font Size: a A A

User Clustering Based On MapReduce In Web Log Mining

Posted on:2016-11-22Degree:MasterType:Thesis
Country:ChinaCandidate:S S ZhouFull Text:PDF
GTID:2308330479499194Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Nowadays, the data on the Web is rapidly increasing, mining these data can get a lot of important information. In this paper, the platform for Innovation Knowledge Cloud generated by mining Web logs can effectively understand customer needs, grasp the customer browsing habits and to enrich the content of the site and show the way to optimize the site has a positive role in promoting. Web log mining by the user clustering, is to have the same user browsing habits gathered in the same class.Users clustering process there will be some errors. On the one hand most of the algorithm used in this paper is based on the user statistical clustering algorithm, and the accuracy of statistics is built on the basis of a large number of experiments.In order to reduce the statistical error, we select multiple test samples for the experiment in the course of experiment. On the other hand in the process of user clustering is an important process in the user identification.In the user identification process, according to the user’s IP address and the user agent to identify the user; then use the session identification algorithms to further enhance the user recognition.User similarity calculation is very important for user clustering.In order to improve the accuracy of user clustering, to build the user access path frequency, user access path series, and the user to access the page content to build multi-dimensional correlation matrix, and set up the coordination coefficient of each matrix to allocate the weight of the matrix in the whole play a role in the process of similarity calculation,to ensure the stability of similarity calculation.When faced with massive amounts of data, based on a single node in the Web log mining bottleneck in time and space. To solve this problem, the whole process of user clustering built on Hadoop distributed platform, Using MapReduce handles log file complete similarity calculation for user clustering.
Keywords/Search Tags:Cloud Computing, Data Mining, User Clustering, Parallel Computing, Similarity Calculation
PDF Full Text Request
Related items