Font Size: a A A

Research On The Application Of A Frequent Sub-tree Algorithm In Web-log Mining

Posted on:2008-01-27Degree:MasterType:Thesis
Country:ChinaCandidate:Z C LiuFull Text:PDF
GTID:2178360272468282Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, especially the popularity of Web sites, the World Wide Web has become the most abundant and mass information source all over the world. The sophisticate Data Mining technologies could properly satisfy the requirement of mining over Web data. Web Mining as the extension of Data Mining technologies to Web data analysis and process, naturally become one of the most active research topics.Web Mining technologies include Web Content Mining, Structure Mining and Usage Mining. They respectively mine in the content of Web pages, structure of inner-/inter-Web pages and Web user's usage information. Frequent patterns mining is one of the premier tasks of Data Mining, researchers have dig into frequent items and sequential patterns mining. However lately, complex frequent structure mining technology is required by those rising fields like Bio-information, digital library and e-commerce. Particularly, mining frequent sub-trees in forest could provide important knowledge for user pattern analysis, Web user classification and clustering in Web-log mining.Mining frequent sub-trees in labeled tree database is an important study direction of frequent sub-tree mining. Previous study indicates that, sequential pattern mining algorithms based on pattern growth method have prominent performance. Scalable Frequent sub-Tree Mining algorithm (SFTM) uses pattern growth method in mining frequent sub-trees in labeled tree database, and improves the pruning method for the searching space. By designing and implementing a Web-log mining tool Webloger that based on frequent sub-tree mining algorithm, we apply the SFTM algorithm to Web-log mining. Under the architecture of Webloger, we compare SFTM with usual algorithms by experiments in generated dataset and real dataset respectively, the experiment result demonstrates that the SFTM algorithm is effective and efficient, and its searching space will shrink rapidly during the mining process, especially in the real Web-log data, it makes obviously advantage over conventional algorithms.
Keywords/Search Tags:Data mining, frequent pattern, frequent sub-tree, sequence database, labeled ordered tree
PDF Full Text Request
Related items