Font Size: a A A

Research On Methods Of Mining Web Users'Usage Patterns And Interests

Posted on:2011-09-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z G ZhuFull Text:PDF
GTID:1119360305955711Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid growth in Internet and WWW, the users' access information is becoming enormous and widespread, which represent the users' access details with the user dimension, time dimension, space dimension, and access object dimension. After mining the access details further, we can discover some deep-level knowledge and rules-the usage patterns and interests of single user (user group). The knowledge can be widely applied to many fields such as Web Personalization, System Improvement and Business Intelligence etc. As for this research subject with profound significance, the main contributions achieved in the paper are as follows:1. According to the four main steps of user usage patterns mining system:data collection, data preparation, patterns discovery and patterns analysis, this paper firstly collects and sums up the classical and latest researches at home and abroad so as to present the overall perspective of this research field. At this basis, two key technical research levels of this paper:data preparations of user access information and mining methods of Web usage patterns and interests are determined. These summaries will lay a foundation for further researches in the paper.2. On the technical level of data preparation, because User session identification is one of most important steps in data preparation, in the third chapter of this paper presents a method for web user session identification based on URL semantic analysis. In this method, every URL in web log files is firstly conceptualized for some semantic information by using Web directory as ontology. On the basis, some factors are defined to measure the semantic distance between URLs and then semantic distance matrix between URLs in certain time interval can be established. Secondly, this paper presents two semantic outliers detection methods:SOAs and SOAD to segment user's session according to static and dynamic web log respectively. Through the computation of SOAs and SOAD to judge whether a candidate can be as a semantic outlier for sessionization. Finally according to the experiment results, this method performs better in all evaluation methods compared with traditional sessionization methods based on time and navigation-oriented heuristics. Be worth to note that the method will be directly applied to the further researches in the followed chapters.3. On the technical level of mining methods of Web usage patterns and interests, existing Web users clustering techniques focus only on discovering knowledge from the static snapshot of Web log data. From the perspective of dynamic nature of Web users'historical access data, this paper presents a method for clustering Web users. In this method system, extended WAS trees and historical WAS trees firstly need to be constructed and then PP-WAP (persistent and preferred Web users' access patterns) can be extracted as clustering feature from historical WAS trees. Next some similarity measures between PP-WAPs are defined in the paper to calculate the similarities between users. After getting a similarity matrix of Web users, the well-known K-Medoid clustering technique is employed to generate the clusters. Finally, the paper conducts two experiments:PP-WAP extracting and Web users clustering. According to the experimental results, the method using dynamic natures of Web users'historical access data as clustering feature is novel. Furthermore, the scalability and computation efficiency of this arithmetic are both better.4. On the technical level of mining methods of Web usage patterns and interests, this paper also constructs two Web user' interest navigational path models:INPM and SINPMPe based on Hidden Markov Model from the perspective of users' interests. Next, the methods for mining interest association rules from the two models are presented. These interest association rules can reflect not only time characteristics of user's access paths but also best association path information carrying users' interests. Finally, this paper conducts three parts of experiments: experiment with simulation data, experiment with real data and comparison experiment with traditional methods. According to the experimental results, these interest association rules mining methods are indeed high efficient and scalable. It is helpful for website designers to improve the Web structure through utilizing the rules. By the way, the methods can carry out mining periodically and offline.
Keywords/Search Tags:Web Data Mining, Web Usage Pattern Mining, User Session Idenfication, Web User Clustering, Users' Access Interests
PDF Full Text Request
Related items