Font Size: a A A

The Research And Implementation Of Crucial Problems In Sequential Pattern Mining

Posted on:2006-09-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y SunFull Text:PDF
GTID:2168360152490288Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Knowledge discovery in databases (KDD) is a rapidly emerging research field relevant to artificial intelligence and database system. The discovery of sequential patterns is an important field in KDD. There exists three main problems in the discovery of sequential patterns:(1)traditional sequential patterns mining algorithms such as AprioriAll algorithm need to scan databases for multiple times, therefore the time performance of these algorithms is poor.(2)The aim of traditional sequential patterns mining is to discover all the frequent sequences. The whole process is lack of pertinence, time costly and often generates a large number of patterns.(3)In the practical use, setting min_support is a subtle task. The work of this dissertation aims at the problems mentioned above. The main context is as follows:Traditional sequential patterns mining algorithms require multiply scans of database, so the process is time costly. The Extended model of Concept Lattice (ECL) is suitable to discover various knowledge including sequential patterns. Considering the characters of the sequential patterns mining, Fequent Concept Lattice (FCL) built through pruning of the ECL can improve the mining efficiency. Since the key aspect of the FCL based sequential patterns mining is how to build the FCL more quickly, a layered FCL building algorithm (FL_Chein) is proposed that only need to scan the database once. Based on the above algorithm, a FCL based sequential patterns mining algorithm (SECLSP) is implemented.The result of traditional sequential patterns mining is a large set of frequent sequences that are hard to be understood. The mining of closed patterns can largely reduce the redundant sequences without the loss of information. Based on the idea of mining from the top rather than mining from the bottom, an effective closed sequential patterns mining algorithm (Multi-pass CS) is presented.Top-k closed pattern is proposed as an extension of closed pattern. Mining top-k closed sequential patterns can solve the third problem in the traditional sequential patterns mining-setting of min_support. A new top-k closed sequential patterns mining algorithm (TKCS) is proposed based on Multi-pass CS algorithm.A prototype system is built based on the previous work. All the algorithms proposed have been implemented and tested. The experimental results prove the superiority of these algorithms.
Keywords/Search Tags:KDD, frequent pattern, sequential pattern, concept lattice, close pattern
PDF Full Text Request
Related items