The Study Of Clustering Web Log Based On Userâ€™s Browsing Interest

Posted on:2014-01-01

Degree:Master

Type:Thesis

Country:China

Candidate:Z Rong

Full Text:PDF

GTID:2248330398982534

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

In recent years, with the rapid development of computer science and technology, network has been widely used, and it has become an important way for people to communicate. More and more people like to get the information what they need through the network. At the same time, as a platform for information exchange, the website is attached importance to increasingly companies which want to obtain huge benefits from it. The important issue of Advanced Network Technology pay close attention to how in a timely manner to optimize according to the userâ€™s access habit and access requirements to satisfy their individual requirements when its website in run. In order resolve this problem, researchers have come up with web log mining method, which compute through usersâ€™ similarity of web log mining, then cluster according to various kinds of clustering methods, finally, understand the needs and interests of the user groups through the analysis of the clustering, so as to improve the network service and provide users with quality service.As an important research field in data mining, Web Log Mining exists many problems in the study:first, in the userâ€™s feature representation, the researchers could not accurately identify usersâ€™of interest in web logs, only divided the page into the target page and navigation page, can not using the userâ€™s browse interest to represent the user features accurately; secondly, in the aspect of the use of clustering algorithms, often ignoring the influence of outlier in the Web log. To solve these two problems, this paper proposes a chameleon algorithm based on user browsing interest, conducts the following two aspects:1. Extract Userâ€™s features:first, the features of user can be partially extracted according to usersâ€™browsing interest; Then, though regarding the transaction recognition as the research object, usersâ€™browse time and contents can be combined to further extracted the userâ€™features; Finally, the similarity of users can be calculated according to the userâ€™features. Experiments show that this method reflects the userâ€™s true browse interest.2. Web log clustering based on user browsing interest:We regard the similarity of users which is obtained based on usersâ€™browsing interest as the weight between two points in the CHAMELEON clustering algorithm. According to the existing web logs, a method called evidence theory is used to deal with the outliers. By this way, the isolated users have been eliminated and the CHAMELEON algorithmâ€™s noise immunity has been improved. In this paper, the experimental data set is derived from DePaul Universityâ€™s benchmark data, which includes a total of20950sessions from5446users. Experimental results show that the improved chameleon algorithm can not only well grasp the userâ€™s browse interest, but also greatly improved the clustering performance and the capability of eliminating isolated points.

Keywords/Search Tags:

Web Log mining, Chameleon, User clustering, Usersâ€™ browsing path interest

PDF Full Text Request

Related items

1	The Study Of Clustering Web Users Based On User's Browsing Path
2	Based On Web-log Frequent Browsing Paths Mining And Technology Analysis
3	Network Users Based On Web Log Clustering And Implementation
4	User Browsing Interest Prediction And Personalization Recommendation Strategies Based On WEB Usage Mining
5	Research On Modeling User Webpage Browsing Interest
6	Clustering Based Net User Interest Mining
7	Web User Interest Mining Based On Ontology
8	The Study Of Web User Fuzzy Clustering Based On Path Similarity
9	User Identification And Interest Analysis Of Internet Access Log Data
10	Research And Implementation Of Mining Implicit User Interest