Font Size: a A A

Research On Multi-dimensional Sequential Pattern Mining Algorithm

Posted on:2013-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:N ZuoFull Text:PDF
GTID:2218330362962918Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Existing sequential pattern mining algorithms just discover sequential patterns insingle sequence information, however, in many applications, a user may hope sequencedatabase merges the interesting multi-dimensional information, and then mines desiredpatterns. Therefore, the paper has mainly focused on how to mine multi-dimensionalsequential patterns efficiently in a multi-dimensional sequence database; these problemsare of great significance in extensive applications, including customer purchase behavioranalysis, web access patterns, disease diagnosis.Firstly, this paper proposes a new mixed method Seq-Cmp for mining multi-dimensional sequential patterns. Seq-Cmp mines sequential patterns in sequentialinformation firstly, and then for each sequential pattern, it collects multi-dimensionalinformation of tuples in database containing the sequential pattern, they form thecorresponding projected multi-dimensional database, finally it compounds to use array-based structure H-arrays or tree-based structure H-forest to mine multi-dimensionalpatterns in sparse or dense projected database. When dimensionality is high, the algorithmoutperforms Seq-Dim and Seq-mdp algorithm.Secondly, in order to effectively find web frequent multi-dimensional sequentialpatterns from multi-dimensional sequence data with multi-dimensional information, a newalgorithm ExtSeq-MIDim is proposed in this paper. The algorithm employs extendedsequential pattern mining method to mine sequential patterns from multi-dimensionalsequence data firstly, and then for each sequential pattern, it finds its corresponding multi-dimensional patterns from multi-dimensional information that support the sequentialpattern in database. For the multi-dimensional pattern mining, this paper based onmemory-indexing puts forward a new fast and efficient mining method.Thirdly, this paper proposes algorithm PSeq-MIDim for mining multi-dimensionalsequential patterns from multi-dimensional multi-sequence database, which containsmultiple sequential dimensions. The algorithm first employs PSeq algorithm to minesequential patterns in a starting sequential dimension and uses a lattice structure to store these sequential patterns, and then in light of lattice structure and by beginning with themined sequential patterns, algorithm PSeq iteratively propagates sequential patterns'position sets and propagation table of k-sequential dimensions to k+1-sequentialdimensions and obtains sequential patterns across multi-sequence dimensions, until allsequential dimensions are propagated. At last it forms the corresponding projectedmulti-dimensional database for each sequential pattern across multi-sequence dimensions,and uses memory-indexing method to mine the multi-dimensional patterns withinprojected databases.Finally, we verify the above algorithms through three experiments, the anticipatedresults are realized.
Keywords/Search Tags:sequential pattern mining, multi-dimensional sequential pattern, projectedatabase, lattice structure, prefix-index
PDF Full Text Request
Related items