Font Size: a A A

Research On Algorithms For Discovering And Querying Sequential Pattern In Uncertain Sequence Databases

Posted on:2012-05-02Degree:MasterType:Thesis
Country:ChinaCandidate:D J MiaoFull Text:PDF
GTID:2218330362450418Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With development of modern technologies in data acquisition, processing, forecasting, uncertain sequence data are widespread in the specific application on science, communications, logistics, finance and other fields. This paper firstly focus on the general form of frequent pattern mining in uncertain sequence database; Secondly, in specific local correlated uncertain databases, we studied the problem of processing nearest neighbor pattern queries algorithm on their snapshot sequences. Mining unknown sequential patterns in order to provide lots of valuable information to users is a very important aspect in uncertainty sequence data mining. Some sequential patterns will likely contain previously unknown valuable rules. Compared to the deterministic sequence database, it is more complex that mining sequential patterns in uncertain sequence database, it usually encounters huge search space. Due to the differences between the two data types, those mining methods based on pattern growth which is widely used in sequence mining cannot be used to solve uncertain sequence pattern mining problems directly. This paper also presents a novel definition of probabilistic frequent nearest neighbor query on the snapshot sequence of uncertain database whose goal is to found those objects who can be the nearest neighbor of query pattern with a certain probability more than specified. However, due to the locally correlation, the existing nearest neighbor algorithms on traditional data or uncertain data cannot be directly used to deal with such huge search space and large time overhead of accessing conditional probability tables. Firstly, the paper addresses a polynomial algorithm for fast ETP calculation to cope with uncertain sequence mining, and then we provide some corresponding pruned strategies. Secondly, we present general management framework of handling probability frequent nearest neighbor queries, and corresponding filtering strategies subsequently. We conduct extensive experiments on artificial and real data, and verified the effectiveness and correctness of the algorithms we provided.
Keywords/Search Tags:uncertain data, sequential pattern, data mining, nearest neighbor sequence
PDF Full Text Request
Related items