Research On Algorithms For Mining Web Logs

Posted on:2007-12-08

Degree:Master

Type:Thesis

Country:China

Candidate:Y B Zhan

Full Text:PDF

GTID:2178360215470208

Subject:Computer Science and Technology

Abstract/Summary:

With the fast development of Internet, Web has become the main platform for information producing, publication, handling and processing. However, Web information is rapidly expanding, thus it causes a serious problem--"information explosion", i.e. information is abundant, but knowledge is relatively poor. Just in this case, Web mining arises at the historic moment. And in Web mining, Web usage mining, which mines web logs on web server, have attracted intensive research interests in recent years.We have done some works in the Web usage mining. Firstly, we have introduced the basic concept and method of data mining and web mining. Then we analysis the characteristic of web logs and study thoroughly the technology of preprocess of web logs. And at last, we research the algorithm of mining association rule based on probability association graph and the algorithm of mining maximal frequent access patterns from web logs. We have obtained the following research results:1,We have thoroughly analyzed and carefully researched the characteristic of Web logs and its collection process. We have elaborated process of preprocessing web logs, in order to remedy the existing problem in conventional preprocessing method, we have made some improvement.2,Combining with topological structure graph of web site, probability association graph is presented based on thorough analysis of user access web behavior. The process of user access is represented as a directed graph. Then based on probability association graph a novel algorithm of mining association rule is proposed. At last, we proved that our algorithm outperforms exiting algorithms in theory, the run time of our algorithm is O ( n~3).3,After analyzing exiting algorithms of mining labeled tree, in view of the concrete problem, a novel U-TreeMiner algorithm for mining maximal frequent subtree from database of unique labeled trees through pattern-growth is proposed based on depth-first traversal encoding string of unique labeled tree. The experiments show that our algorithm U-treeMiner is more efficient than other algorithms.4,We simply use unique labeled tree to represent the user session. And taking into account property of different pages, a new method of calculating support is presented. Based on the new method, the algorithm U-treeMiner is used to mine maximal frequent access patterns from web logs. And experiments show that U-TreeMiner outperforms TreeMiner when the support is little low.

Keywords/Search Tags:

Web Logs, Probability Association Graph, Unique Labeled Tree, Maximal Frequent Access Patterns

Related items

1	Research On Technologies And Algorithms Of Web Logs Mining
2	Research On Mining Algorithms Of Maximal Frequent Item Sets
3	The Research On The Algorithms Of Mining Distributed Maximal Frequent Patterns
4	Study On Frequent Pattern Mining Algorithms And Pruning Strategies
5	Research On Algorithms For Mining Maximal Frequent Itemsets
6	Research On The Algorithm For Mining Continuous Frequent Access Patterns From Web Logs
7	The Research On The Related Problems Of Association Rule Mining
8	Research On Mining Frequent Pattern Based On The Optimized FP-Tree In Data Streams
9	Research On Data Mining Technology For Very Large Databases
10	Research On Key Algorithms For Mining Frequent Patterns In Data Streams And Their Application In Simulation System