Research On The Application Of A Frequent Sub-tree Algorithm In Web-log Mining

Posted on:2008-01-27

Degree:Master

Type:Thesis

Country:China

Candidate:Z C Liu

Full Text:PDF

GTID:2178360272468282

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the rapid development of Internet, especially the popularity of Web sites, the World Wide Web has become the most abundant and mass information source all over the world. The sophisticate Data Mining technologies could properly satisfy the requirement of mining over Web data. Web Mining as the extension of Data Mining technologies to Web data analysis and process, naturally become one of the most active research topics.Web Mining technologies include Web Content Mining, Structure Mining and Usage Mining. They respectively mine in the content of Web pages, structure of inner-/inter-Web pages and Web user's usage information. Frequent patterns mining is one of the premier tasks of Data Mining, researchers have dig into frequent items and sequential patterns mining. However lately, complex frequent structure mining technology is required by those rising fields like Bio-information, digital library and e-commerce. Particularly, mining frequent sub-trees in forest could provide important knowledge for user pattern analysis, Web user classification and clustering in Web-log mining.Mining frequent sub-trees in labeled tree database is an important study direction of frequent sub-tree mining. Previous study indicates that, sequential pattern mining algorithms based on pattern growth method have prominent performance. Scalable Frequent sub-Tree Mining algorithm (SFTM) uses pattern growth method in mining frequent sub-trees in labeled tree database, and improves the pruning method for the searching space. By designing and implementing a Web-log mining tool Webloger that based on frequent sub-tree mining algorithm, we apply the SFTM algorithm to Web-log mining. Under the architecture of Webloger, we compare SFTM with usual algorithms by experiments in generated dataset and real dataset respectively, the experiment result demonstrates that the SFTM algorithm is effective and efficient, and its searching space will shrink rapidly during the mining process, especially in the real Web-log data, it makes obviously advantage over conventional algorithms.

Keywords/Search Tags:

Data mining, frequent pattern, frequent sub-tree, sequence database, labeled ordered tree

PDF Full Text Request

Related items

1	Research On Frequent Pattern Mining In XML
2	Research On Mining Algorithms Of Maximal Frequent Item Sets
3	The Research On The Related Problems Of Association Rule Mining
4	Research On Mining Algorithm Of Association Rules Based On Frequent Pattern Tree
5	Research On Frequent Pattern Of Tree Data
6	The Research On Frequent Subtrees Mining And Corresponding Techniques
7	Research On Mining Frequent Pattern Based On The Optimized FP-Tree In Data Streams
8	Study On Association Rules Mining Algorithm Based On FP-tree
9	The Analysis, Based On Data Mining Algorithms For Frequent Pattern Tree
10	A Study On Algorithms Of Weighted Frequent Pattern Mining