Font Size: a A A

Research On Frequent Sequence And Frequent Closed Sequence Mining Based On Minimal Location

Posted on:2013-06-23Degree:MasterType:Thesis
Country:ChinaCandidate:K XiongFull Text:PDF
GTID:2298330467478173Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The popularization of network and the rapid development of computer technology, the technology of infotmation management and infotmation system lead to a rapid increase of data in all kinds of vocations. Just these cases bring birth to the KDD and DM, which provide people with a new approach to understand data. Data Mining is just a process of discovering information and knowledge with potential value, which is unknown in advance, from large numbers of half-baked and stochastic data with noise. Sequential pattern mining is an important field in Data Mining. Its purpose is to discover the hidden and interest sequential relation from large sequence data-bases. The thesis devotes to study frequent sequence and frequent closed sequence mining.In order to mine sequential patterns efficiently, the thesis creates the ML-list structure based on conventional sequential patterns mining algorithms and the characteristics of sequential patterns. Based on this structure, this thesis proposes a new frequent sequence mining algorithm called FSM_BML and a new frequent closed sequence mining algorithm called FCSM_BASC. The thesis focuses on reducing the times of scaning records in the original sequence data-bases. In addition, it proposes a method that can improve efficiency of computing support and a closed checking method that is just done among adjacent sequences. The main contributions are as follows.First, the thesis uses uniform record number to solve the problem of scaning all of the records in the original sequence data-bases or all of the projections in the original projective data-bases that is used in conventional frequent sequence mining algorithms.Second, the thesis proposes a method that can ascertain the begining position of searching quickly by using the minimal location of sequences. This method enhances the pertinence of searching sequences. Unlike traditional frequent sequence mining algorithms, it avoids searching from the begining every time. Therefore, the efficiency of frequent sequence mining is improved.Third, the thesis proposes a closed sub-pattern checking method that is just done among adjacent sequences. This method reduces the range of checking greatly, and only reserves closed sequence candidates one time, which makes the efficiency of frequent closed sequence mining improved greatly.Finally, the thesis proposes a pruning method called part redundance-escape pruning. It can confirm partial non-closed sequences ahead, and reduce partial counts of support. In addition, the thesis proposes an alternative pruning method called redundance-escape pruning. It can confirm partial closed sequences ahead, and prune the frequent x-sequences that can not be extended to be frequent (x+1)-sequences. This method reduces the searching space. These two pruning methods save the time of frequent closed sequence mining.The results of experiments prove the correctness and efficiency of the algorithms proposed in this thesis.
Keywords/Search Tags:Frequent sequence mining, the minimal location, closed sequence mining, adjacent sequence
PDF Full Text Request
Related items