An Efficient Algorithm With Incremental Data Mining For Mining Sequential Pattern(NPSP)

Posted on:2004-05-24

Degree:Master

Type:Thesis

Country:China

Candidate:B Zhang

Full Text:PDF

GTID:2168360122460460

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Recently, our capabilities of both generating and collecting data have been increasing rapidly. The computerization of many business and government transactions, and the advances in data collection tools have provided us with huge amounts of data, millions of databases have been used in business management, government administration, scientific and engineering data management, and many other applications. It is noted that the mumber of such databases keeps growing rapidly because of the availability of powerful and affordable database systems. This explosive growth in data and databases has generated an urgent need for new techniques and tools that can intelligently and automatically transform the processed data into useful information and knowledge. Consequently, data mining has become a research area with increasing importance.The problem of sequential pattern is an important branch of data mining. The concept of sequential pattern is introduced to capture typical behaviours over time, i.e. behaviours sufficiently repeated by individuals to be relevant for the decision maker. If we are given a database of sequences, where each sequence is a list of transactions ordered by transaction-time, and each transaction is a set of items. The problem is to discover all sequential patterns with a user-specified minimum support, where the support of a pattern is the number of data-sequences that contain the pattern. An example of a sequence pattern is "5% of customers bought 'Foundation' and 'Ringworld' in one transaction, followed by 'Second Foundation' in a later transaction". We generalize the problem as follows. First, we add time constraints that specify a minimum and/or maximum time period between adjacent elements in a pattern. Second, we allow the items to be present in a set of transactions whose transaction-times are within a user-specified time window. Third, given a user-defined taxonomy(is a hierarchy) on items, we allow sequential patterns to include items across all levels of the taxonomy.The GSP algorithm and the PSP algorithm is the main two algorithms.They both scale linearly with the number of data-sequences, and have very good scale-up properties with respect to the average data-sequence size, but they both do not have the function of incremental data mining. In this paper, we present an approach, called NPSP(New Perfectly Sequential Pattern),for mining sequential patterns embedded in a database. Different from the algorithm of GSP or PSP, the NPSP algorithm can mining sequential patterns over a database of sequences with increment.In the NPSP algorithm, we use a new data structure and we name it "Heterogeneity Tree". Then we discuss the algorithm in detail. We experimented on the incremental function of the NPSP algorithm using several synthetic data, and empirical evaluation indicates that the incremental idea of the NPSP algorithm is right and is much faster than the normal mining(not using the incremental function). At the same time, NPSP scales linearly with the number of data-sequences, and has very good scale-up properties with respect to the average data-sequence size.

Keywords/Search Tags:

Data Mining, Sequence Patterns, Heterogeneity Tree, the NPSP algorithm, Incremental Data Mining

PDF Full Text Request

Related items

1	Research On An Algorithm For Time Sequential Pattern Mining
2	Research On Mining Algorithm Of Web Log Frequent Sequential Patterns
3	Mining Shared Knowledge\Patterns Between Two Datasets
4	Data Mining Algorithm And Its Application In The Tourism Industry
5	Research On Frequent Subtree Mining
6	Enhanced PL-WAP tree method for incremental mining of sequential patterns
7	Incremental Web Data Mining System Based On Classification Tree
8	Research On Itemset Sequential Patterns Mining Algorithm Based On Memory Indexing&Bitmap
9	The Incremental Mining Algorithm Based On HFUFP-tree
10	Research On Incremental Mining Algorithm And Application For Dynamic Databases