Font Size: a A A

Research On Algorithms For Mining Web Logs

Posted on:2007-12-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y B ZhanFull Text:PDF
GTID:2178360215470208Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the fast development of Internet, Web has become the main platform for information producing, publication, handling and processing. However, Web information is rapidly expanding, thus it causes a serious problem--"information explosion", i.e. information is abundant, but knowledge is relatively poor. Just in this case, Web mining arises at the historic moment. And in Web mining, Web usage mining, which mines web logs on web server, have attracted intensive research interests in recent years.We have done some works in the Web usage mining. Firstly, we have introduced the basic concept and method of data mining and web mining. Then we analysis the characteristic of web logs and study thoroughly the technology of preprocess of web logs. And at last, we research the algorithm of mining association rule based on probability association graph and the algorithm of mining maximal frequent access patterns from web logs. We have obtained the following research results:1,We have thoroughly analyzed and carefully researched the characteristic of Web logs and its collection process. We have elaborated process of preprocessing web logs, in order to remedy the existing problem in conventional preprocessing method, we have made some improvement.2,Combining with topological structure graph of web site, probability association graph is presented based on thorough analysis of user access web behavior. The process of user access is represented as a directed graph. Then based on probability association graph a novel algorithm of mining association rule is proposed. At last, we proved that our algorithm outperforms exiting algorithms in theory, the run time of our algorithm is O ( n~3).3,After analyzing exiting algorithms of mining labeled tree, in view of the concrete problem, a novel U-TreeMiner algorithm for mining maximal frequent subtree from database of unique labeled trees through pattern-growth is proposed based on depth-first traversal encoding string of unique labeled tree. The experiments show that our algorithm U-treeMiner is more efficient than other algorithms.4,We simply use unique labeled tree to represent the user session. And taking into account property of different pages, a new method of calculating support is presented. Based on the new method, the algorithm U-treeMiner is used to mine maximal frequent access patterns from web logs. And experiments show that U-TreeMiner outperforms TreeMiner when the support is little low.
Keywords/Search Tags:Web Logs, Probability Association Graph, Unique Labeled Tree, Maximal Frequent Access Patterns
PDF Full Text Request
Related items