Font Size: a A A

The Research On Frequent Sequential Pattern Mining Algorithms In Uncertain Databases

Posted on:2016-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:L B LiFull Text:PDF
GTID:2428330473965674Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,uncertain data has been more widely appreciated.Data uncertainty is inherent in many real-world applications,such as sensor data monitoring,environmental surveillance,mobile tracking and location-based services,due to environmental factors,device limitations,privacy issues,etc.The prevalence of uncertain data and it often plays a key role.Uncertain data mining also became a very important research topic in data mining.In this paper,I take the frequent sequential pattern mining algorithm of uncertain data as our research object.Compared to the deterministic sequence database,it is more complex that mining sequential patterns in uncertain sequence database,it usually encounters huge search space.Due to the differences between the two data types,those sequencial pattern mining methods based on deterministic data which is widely used in sequence mining cannot be used to solve uncertain sequence pattern mining problems directly.This paper analyzes the method of mining sequential patterns in the deterministic data,respectively based on the candidate generate-and-test approach and pattern-growth approach.I make research on several typical mining algorithm,shows that the pattern-growth approach is more scalable than the candidate generate-and-test approach.Then,I expound Some basic theory of uncertain data mining.I make use of possible world as data modal that includes Source-Level Uncertainty and Event-Level Uncertainty.General uncertain sequence data which is studied in this paper,needs to determine whether sequential patterns as possible frequent sequential patterns.There are two commonly used that ways:Expect Support and Probabilistic Frequentness.I review the frequent sequential pattern mining algorithm in uncertain database,and study the general train of thought and method of mining frequent sequential patterns in the uncertain data.Through the comparison and analysis of uncertain sequential pattern mining algorithm,and link to the classical theory framework of sequential pattern mining,show that the pattern-growth approach is more scalable than the candidate generate-and-test approach.To determine whether the uncertain data sequence is frequent,Probabilistic Frequentness is superior toExpect Support.At same time,when calculating Probabilistic Frequentness,Divide and conquer strategy relative to dynamic programming strategy cost less.Frequent sequential patterns mining in uncertain data,can lead an exponential number in probabilistically frequent sequential patterns,which contains some useless mining results and causes redundancy of frequents equence.Regarding to the above disadvantages,this paper put forword a definition of probabilistically frequent closed sequential patterns(p-FCSPs),and proposed a mining algorithm of p-FCSPs based on uncertain data,called U-FCSM.Based on a tuple uncertain data model,this algorithm calculated the possibility of frequent sequences based on divide-and-conquer,and then judged whether probabilistically frequent sequences was p-FCSPs,according to the idea of closed sequence of BIDE algorithm principle.In order to reduce the search space and avoid redundant computation,it applied several pruning and boundary techniques.Finally,extensive experiments show that the effectiveness and efficiency of U-FCSM.
Keywords/Search Tags:uncertain data, probabilistic frequentness, frequent sequential pattern, data mining, probabilistically frequent closed sequential patterns
PDF Full Text Request
Related items