User Clustering Based On MapReduce In Web Log Mining

Posted on:2016-11-22

Degree:Master

Type:Thesis

Country:China

Candidate:S S Zhou

Full Text:PDF

GTID:2308330479499194

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Nowadays, the data on the Web is rapidly increasing, mining these data can get a lot of important information. In this paper, the platform for Innovation Knowledge Cloud generated by mining Web logs can effectively understand customer needs, grasp the customer browsing habits and to enrich the content of the site and show the way to optimize the site has a positive role in promoting. Web log mining by the user clustering, is to have the same user browsing habits gathered in the same class.Users clustering process there will be some errors. On the one hand most of the algorithm used in this paper is based on the user statistical clustering algorithm, and the accuracy of statistics is built on the basis of a large number of experiments.In order to reduce the statistical error, we select multiple test samples for the experiment in the course of experiment. On the other hand in the process of user clustering is an important process in the user identification.In the user identification process, according to the user’s IP address and the user agent to identify the user; then use the session identification algorithms to further enhance the user recognition.User similarity calculation is very important for user clustering.In order to improve the accuracy of user clustering, to build the user access path frequency, user access path series, and the user to access the page content to build multi-dimensional correlation matrix, and set up the coordination coefficient of each matrix to allocate the weight of the matrix in the whole play a role in the process of similarity calculation,to ensure the stability of similarity calculation.When faced with massive amounts of data, based on a single node in the Web log mining bottleneck in time and space. To solve this problem, the whole process of user clustering built on Hadoop distributed platform, Using MapReduce handles log file complete similarity calculation for user clustering.

Keywords/Search Tags:

Cloud Computing, Data Mining, User Clustering, Parallel Computing, Similarity Calculation

PDF Full Text Request

Related items

1	Research On Parallel Processing Technology Of Large-scale Text Mining Under Cloud Computing Environment
2	General Cloud-native Big Data Architecture With Kubernetes
3	Research About Data Mining Technologies Based On Cloud Computing
4	Research And Application Of Clustering Mining Algorithm Based On Cloud Computing
5	Research On GSM-R Daea Mining Platform With Cloud Computing
6	Research And Implementation Of Parallel Data Mining Algorithms Based On Cloud Computing
7	Research On Data Mining Algorithm And Its Parallelization In Cloud Computing
8	Research On The Parallel Data Mining Strategy Under The Cloud Computing Environment
9	Research Of Key Technologies Of Data Mining Tellcommunication Oriented
10	Research On Key Technologies Of Secure Data Mining Outsourcing In Cloud Computing Environment