Font Size: a A A

Research On Parallel Algorithm For Sequential Pattern Mining

Posted on:2008-06-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2178360218952631Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Sequential pattern mining is the mining of frequent seqences related to time or other orders from the sequence database. Its initial motivation is to discover the laws of customer purchasing in a time section by finding the frequent sequences. In recent years, sequential pattern mining has become an important direction of data mining, and its application field has not been confined to the transanction database and has extended to new data sources such as Web and advanced science fields such as DNA analysis.Combining with the relevant definitions put forward by R.Agrawal and R.Srikant and Concept Lattice Theory brought forward by R.Wille, this paper proposes the term of frequent concept. To fulfill the needs of sequential pattern parallel mining, there proposes search space partition theory which includes the definitions of search space, sub search space, equivalence sub search space and maximum search space.The data of sequential pattern mining has characteristics as follows: mass data amount and distributed storage. Most existing sequential pattern mining algorithms havn't considered the above-mentioned characteristics synthetically. Contraposing the traits mentioned above and combining the paralle theory, this paper put forward a new distributed parallel algorithm SPP(Sequential Pattern Parallel). The algorithm abides by the principal of pattern reduction and utilizes the divide-and-conquer strategy for parallelization. The first parallel task is to construct frequent itemsets applying frequent concept and search space partition theory and the second task is to structure frequent sequences using the depth-first search method at each processor. The algorithm only needs to access the database twice and doesn't generate the candidated sequences, which abates the access time and improves the minging efficiency. But this algorithm adopts the static loading balancing, so it is also needed to improve.Based on the random data generation procedure and different information structure designed, this paper simulated the SPP algorithm in a concrete parallel environment and implemented the AprioriAll on a computer. The experiments demonstrated that compared with AprioriAll, the SPP algorithm had excellent speedup factor and efficiency.
Keywords/Search Tags:sequential pattern, parallel, concept lattice, search space
PDF Full Text Request
Related items