Font Size: a A A

Research On Closed Pattern Based Data Mining Technologies

Posted on:2017-09-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:M HanFull Text:PDF
GTID:1318330512975544Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
A data stream is a continuous,unbounded and time changed sequence of data elements.The data stream has very different characteristics compared with the traditional static data or database,such as dynamic,infinite,order,non-repetition,high speed and change.In the real data stream environment,distributions of some data source are changed with time,which is called concept drift.These data streams are evolving data streams or concept drift data streams.Therefore,the processing methods adjust to the concept changes automatically.In order to get interesting,compressed and lossless frequent pattern result sets,research on constrained and closed frequent pattern mining algorithms.For improving the efficiency of data stream classification,research pattern-based classification algorithms.Research methods to deal with concept drift problems in the processing of frequent pattern mining and classifications.The main works include:(1)The main challenge of mining frequent pattern over data stream is the huge quantity of discovered frequent patterns because of the infinite data.It may lead to the explosion of the pattern number especially with the low support threshold.In some data stream applications,the information embedded in recent transactions is of particular value.Therefore,we research on closed-operator-based method to improve the efficiency of closed pattern mining.We design an average decay factor to balance high Recall with high Precision of algorithm.We propose a method TDMCS for mining closed frequent patterns on data stream efficiently based on sliding window model and time decay model.The performance of proposed method is evaluated via experiments,and the results show that the proposed method is efficient and steady.And it is also superior to other analogous algorithms.(2)Existed time decay factors set weights of recent and historical transactions with same decay intensity,which cannot distinguish the importance between the new and the old transactions.Therefore,a novel way to design decay factor based on Gaussian function is proposed in this paper.Compared with the existing methods,the decay intensity of the new transaction is lower,and the decay intensity of the historical transaction is higher.Four time decay models based on different decay factors are designed.Algorithm based on Gaussian decay factor and accumulation values is designed to discover closed frequent patterns over data streams.The performances of proposed methods to mine frequent patterns on high-density or low-density data streams are evaluated via experiments.Compared with other ways,set decay factor based on Gaussian function can get better performance of algorithm.(3)High-dimensional data has many same items in a sequence.Mining this kind of data may discover very large patterns sets as a result which includes small and discontinuous sequential patterns.These patterns do not bear any useful information for usage.Mining sequential patterns in such sequences need to consider different forms of patterns,such as contiguous patterns,local patterns which appear more than one time in a special sequence and so on.Mining closed pattern leads to a more compact result set but also a better efficiency.In this paper,a novel algorithm MCCPM based on multi-supports is provided specifically for mining contiguous closed patterns in high-dimensional dataset.Three kinds of contiguous closed sequential patterns are mined which are sequential patterns,local sequential patterns and total sequential patterns.Performances have demonstrated that the proposed algorithm reduces memory consumption and generates compact patterns.A detailed analysis of the multi-supports-based results is provided in this paper.And these interesting patterns can be used to match sequences,or classify unknown sequences.(4)Data stream may contain a large number of useless information or noises.Frequent pattern mining can drop such useless information and discover patterns which contain more information than single attribute.Therefore,frequent and discriminative pattern can be used to effectively classify.In this paper,we propose a two-steps method PatHT to generate decision tree for evolving data stream classification.First step,an incremental algorithm CCFPM is proposed to discover closed and class-constraint frequent pattern set CFTSet.Second step,an incremental algorithm HTreeGrow is proposed to train concept-drift decision tree based on CFPSet.Concept drift detector is used to discover concept change;therefore classification model is adjusted automatically.For high-density and low-density data streams,we design different ways to use pattern sets.The performance of proposed method is evaluated via experiments.Using synthetic data streams and real life data streams shows that the proposed method is superior to other analogous algorithms.
Keywords/Search Tags:Data streams, Frequent pattern mining, Closed patterns, Constrained patterns, Classification, Frequent sequential pattern mining
PDF Full Text Request
Related items