Font Size: a A A

Research On Sequential Pattern Mining Based On Prime Encoding

Posted on:2012-02-11Degree:MasterType:Thesis
Country:ChinaCandidate:L L SunFull Text:PDF
GTID:2248330395964024Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As an active research branch of the data mining, sequence pattern mining has wide application, such as analysis of customer buying, analysis of web clickstream, analysis of biological sequence patterns, etc. At present, many scholars have gained lots of researches and have presented various algorithms. For example, GSP algorithm, SPADE algorithm and PrefixSpan algorithm, and so on.However, current algorithms for mining sequence patterns only extract the frequent sequences satisfying the minimum support threshold minsup. Nevertheless, the users may need much more abstract information, that is, multi-level sequential pattern mining, which is based on the taxonomy concept the items in the sequence database can be classified into different categories, and formed different levels. And the users may be more concerned about the important sequence patterns, that is, weighted sequential pattern mining, which involved the weight constraint to the entire mining process.Because the prime encoding of the prime number theory has the advantages of good mathematical properties, distinct expression of the hierarchical information and simple update operation and so on, this paper conducts a systematic study for the combination of sequence pattern mining and prime encoding. The main research results are as follows:(1) In the multi-level sequential pattern mining, encoding is not only to express the hierarchical relationship, but also to facilitate the identification of the relationship between different levels, which will directly affect the efficiency of the algorithm. In this paper, we prove that one step of division operation can decide the parent-child relationship between different levels by using prime encoding and present PMSM algorithm and CROSS-PMSM algorithm which are based on prime encoding for mining multi-level sequential pattern and cross-level sequential pattern respectively. Experimental results show that the algorithm can effectively extract multi-level and cross-level sequential pattern from the sequence database. (2) In the weighted sequential pattern mining, the algorithm MWSP is one of the best algorithms, but during the mining process, it will generate the situation of candidate combinatorial explosion easily because of base on the candidate generation-and-test approach, therefore, this paper presents an efficient algorithm PWSM, which introduces the concept of K-minimum weighted support,utilizes the principle of prefix projection database to avoid the occurrence of candidate combinatorial explosion, and takes full advantage of the minimum weighted support to optimize the algorithm. The experimental results show that the algorithm PWSM is more effective than the algorithm MWSP on mining weighted sequential patterns from the sequence database.(3) On the basis of the multi-level sequential pattern mining and weighted sequential pattern mining, this paper defined the multi-level weighted sequence patterns MWSP(Multi-level Weighted Sequential Pattern), proposed the framework of multi-level weighted sequential pattern mining, and present the multi-level weighted sequential pattern mining algorithm(PMWSM) based on prime encoding. The algorithm can provide more abstract information and discover very valuale sequential patterns to meet users’ needs. The experimental results show that the algorithm PMWSM has excellent performance on the time-spatial complexity.
Keywords/Search Tags:data mining, sequence pattern mining, prime encoding, multi-levelsequence pattern, weighted sequence pattern
PDF Full Text Request
Related items