Font Size: a A A

Research On An Algorithm For Time Sequential Pattern Mining

Posted on:2008-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y YanFull Text:PDF
GTID:2178360242958955Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data Mining has become one of the fast growing areas of research in recent years. Besides association rules mining, researchers endeavor to develop mining methods with time factor considered. Popular research topics include customers bying patterns analysis, Internet surfing time-series analysis, trend analysis,and so on. When probing the customers buying time-series patterns, most developed mining methods require repeated database scans to generate candidate patterns, which are then checked to find frequent time-series patterns.The mining of sequential patterns is one of the hottest spots in the field of DM. The purpose of sequential patterns mining is to find the frequent sequences in transaction databases and then use these patterns to help decision-makers. The concept of sequential pattern is introduced to capture typical behaviours over time, i.e. behaviours sufficiently repeated by individuals to be relevant for the decision maker. If we are given a database of sequences, where each sequence is a list of transactions ordered by transaction-time, and each transaction is a set of items. The problem is to discover all sequential patterns with a user-specified minimum support, where the support of a pattern is the number of data-sequences that contain the pattern. The excuting efficiency is one of the important problem in the data mining .The AprioriAll algorithm is the method of finding sequence patterns, but has the disadvantage in the complexity of space and time. Therefore, this dissertation introduces a new algorithm based on adjacency matrix that does not need to produce the candidate item sets. This algorithm produces frequent pattern by joining suffix with prefix, consequently avoids scanning the database many times, and lowers the time expense.In this paper, we present an approach for mining sequential patterns embedded in a database. The algorithm can mining sequential patterns over a database of sequences .In the algorithm, we use a new data structure and we name it "sequences thread tree". Then we discuss the algorithm in detail. We experimented on the function of the algorithm using several synthetic data.Key algorithm are tested and verified. Parameters impacts on the performance and results of mining parameters are experimented and analyzed. The performance of TTSP and FPAM are compared and empirical evaluation indicates that the incremental idea of the algorithm is right and is much faster than the normal mining. At the same time, the algorithm scales linearly with the number of data-sequences, and has very good scale-up properties with respect to the average data-sequence size.
Keywords/Search Tags:data mining, sequence patterns, sequences thread tree, incremental data mining
PDF Full Text Request
Related items