Font Size: a A A

Sequential Pattern Mining With Non-overlapping Constraints

Posted on:2016-10-30Degree:MasterType:Thesis
Country:ChinaCandidate:T XiFull Text:PDF
GTID:2348330536986827Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Data mining,a hot research area,is to dig out the hidden wealth of information from the mass of data.Sequential pattern mining,as one of its important branches,can dig out frequent patterns with higher frequency of occurrences than others,widely used in many fields,such as biomedical research,information retrieval and so on.The so-called non-overlapping means any two frequent patterns in the same location are not allowed to appear the same character.Sequential pattern mining with non-overlapping constraints is such a sequential pattern mining problem of digging out all frequent patterns satisfying the non-overlapping constraints and no less than a given threshold.Compared with the traditional sequential pattern mining problems,the frequent patterns with non-overlapping constraints can better meet the needs of users.Therefore,this dissertation focuses on studying sequential pattern mining with non-overlapping constraints.The main research contents and related work of this paper are as follows:(1)For that the current algorithm INSgrow exists the defects of missing feasible solutions,leading the incompleteness of mining algorithm GSgrow,we propose a method based on the structure of nettree from whose roots to its leaves to find the number of appearances and design the complete algorithm NORL(Non-Overlapping Constraint using Nettree from Root to Leaf)under non-overlapping constraints,and on this basis,we propose two complete mining algorithms,depth-first mining algorithm MAFPD(Mining All Frequent Patterns using Nettree with Depth first search)and breadth-first mining algorithms MAFPB(Mining All Frequent Patterns using Nettree with Breadth first search).(2)For the three mining algorithms of GSgrow,MAFPD and MAFPB,we carry out a lot of comparative experiments on the chosen DNA and other data sets,and analyze the experimental results from both mining results and mining time of the algorithms.The experimental results show that the mining results of algorithm MAFPD is consistent with that of algorithm MAFPB,and these two algorithms can mine more frequent patterns than the algorithm GSgrow,not only verifying the incompleteness of algorithm GSgrow,butalso verifying the correctness of the algorithm MAFPB and algorithm MAFPD.That the mining time of algorithm MAFPB is less than algorithm MAFPD,shows algorithm MAFPB makes better use of Apriori property,with small space consumption and fast running speed,verifying the algorithms effectiveness of the paper.
Keywords/Search Tags:Sequential pattern mining, Non-overlapping constraints, Nettree, Pattern matching, Apriori property
PDF Full Text Request
Related items