Font Size: a A A

Mining Web User Access Patterns Based On Rough Sets

Posted on:2009-08-27Degree:MasterType:Thesis
Country:ChinaCandidate:Z J LiuFull Text:PDF
GTID:2178360272980252Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
An integration of the data mining technology with the World Wide Web now makes it possible to perform a data mining based on the Internet Web log records collected. The data mining technology in question can be used for weblog records to discover the user's access patterns to Website pages, thus forming a weblog access pattern mining. The mining under discussion has its aims to discover useful information of users from Weblogs and find out the most frequent visiting time, association rules, sequential mode, clustering pattern, classification mode and Web visiting tendency etc. which are of major significance for optimizing the Website structure and providing personalized services for users in various categories. The weblog access patterns mining is now the hotspot of the data mining. In the light of features of data sources during the use of Web by users, the author has described in detail the concept, method and process of the weblog access pattern mining.Firstly, the pretreatment of data in the Web log mining was studied. The data of the Web log mining was not the original data on the web but the processed data extracted in the interactive process between users and the web, including the URL requested, IP address from which a request was sended and time stamp etc. These data can provide plentiful information about users during their visits. The research work in this regard has been mainly focused on how to extract the visiting characteristics of users (such as accessing behavior, frequency and content etc. of the users) and establish a data model based on the access behavior of users.Secondly, the weblog access pattern mining was also studied based on a rough sets theory. Knowledge is regarded as an ability of classification in the rough sets theory, i.e. an ability to partition a domain. With the rough sets theory serving as a basis, the pretreated data were discretization processed with a data model being given. A reduction algorithm was used to conduct a simplification and stable classification rules were also extracted. Due to the uncertainty of the boundaries between the categories of sessions in the weblogs, a new method for clustering user's access patterns by using the rough sets theory was also presented. The method in question takes into account both the sequential order of the data of sessions and the content in the sets. The experimental results obtained on the basis of experimental sets show that the algorithm in question is feasible. Finally, the problems yet to be solved in the future research are pointed out and a forecast of future research on Web log mining is also given.
Keywords/Search Tags:Clustering, Rough sets, Web access patterns, Web log mining
PDF Full Text Request
Related items