Font Size: a A A

The Study Of Clustering Web Log Based On User’s Browsing Interest

Posted on:2014-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:Z RongFull Text:PDF
GTID:2248330398982534Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years, with the rapid development of computer science and technology, network has been widely used, and it has become an important way for people to communicate. More and more people like to get the information what they need through the network. At the same time, as a platform for information exchange, the website is attached importance to increasingly companies which want to obtain huge benefits from it. The important issue of Advanced Network Technology pay close attention to how in a timely manner to optimize according to the user’s access habit and access requirements to satisfy their individual requirements when its website in run. In order resolve this problem, researchers have come up with web log mining method, which compute through users’ similarity of web log mining, then cluster according to various kinds of clustering methods, finally, understand the needs and interests of the user groups through the analysis of the clustering, so as to improve the network service and provide users with quality service.As an important research field in data mining, Web Log Mining exists many problems in the study:first, in the user’s feature representation, the researchers could not accurately identify users’of interest in web logs, only divided the page into the target page and navigation page, can not using the user’s browse interest to represent the user features accurately; secondly, in the aspect of the use of clustering algorithms, often ignoring the influence of outlier in the Web log. To solve these two problems, this paper proposes a chameleon algorithm based on user browsing interest, conducts the following two aspects:1. Extract User’s features:first, the features of user can be partially extracted according to users’browsing interest; Then, though regarding the transaction recognition as the research object, users’browse time and contents can be combined to further extracted the user’features; Finally, the similarity of users can be calculated according to the user’features. Experiments show that this method reflects the user’s true browse interest.2. Web log clustering based on user browsing interest:We regard the similarity of users which is obtained based on users’browsing interest as the weight between two points in the CHAMELEON clustering algorithm. According to the existing web logs, a method called evidence theory is used to deal with the outliers. By this way, the isolated users have been eliminated and the CHAMELEON algorithm’s noise immunity has been improved. In this paper, the experimental data set is derived from DePaul University’s benchmark data, which includes a total of20950sessions from5446users. Experimental results show that the improved chameleon algorithm can not only well grasp the user’s browse interest, but also greatly improved the clustering performance and the capability of eliminating isolated points.
Keywords/Search Tags:Web Log mining, Chameleon, User clustering, Users’ browsing path interest
PDF Full Text Request
Related items