Font Size: a A A

An Algorithm For Mining Embedded Disordered Closed Subtree In Tree Stream

Posted on:2012-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y P ZhangFull Text:PDF
GTID:2218330338956623Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of information technology, data mining in data stream is one of the issues of challenging。There are many data streams in real-time application, such as sensor data in sensor networks, web of web records etc.。Traditional "first memory after the processing" of data mining technology which can not solve faster speed, single scanning, data volume and other characteristics of stream mining has become increasingly powerless。Tree stream is one of the areas most widely used, so mining frequent subtree is very useful in tree stream。It has been expanded by the number of subtree and efficiency of computing support that mining frequent subtree in tree stream。This article first proposed the theory of linear expansion strategy。It can easily determine where to insert nodes and the subtree binomial encoding of new candidate。Then put forward a data structure LTPS which used to store tree items to avoid the multiple-pass binomial encoding。Based on this structure we proposed DFLinApri algorithm whilch can enumerate embedded disordered subtrees in tree stream and effectively calculate the support of the candidate subtrees。Last, we proposed a data structure TID which is the simplification of the LTPS。Based on it we proposed BFLinApri algorithm which improve DFLinApri and eliminate the redundancy。In this article there are two classes data sets that are synthetic datasets (F5 and D10)and real datasets(cslogs)。Results suggested that the two algorithms in this article have more time efficiency than PrefixTreeISpan and TreeMiner in two different data sets which are F5 and D10。Besides, they also indicated that the number of closed subtree produced by the two is obviously fewer than the number of frequent subtree produced by PrefixTreeISpan。The runtime of TreeMiner is two times of BFLinApri on cslogs datasets。...
Keywords/Search Tags:data mining, tree stream, closed frequent subtree, embedded tree, disordered tree
PDF Full Text Request
Related items