Font Size: a A A

Logic-based Frequent Sequential Pattern Mining Algorithm

Posted on:2015-01-21Degree:MasterType:Thesis
Country:ChinaCandidate:J FengFull Text:PDF
GTID:2308330461488646Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Frequent sequential pattern mining is an important research field in data mining. Sequence data is very common in our daily life, which has significantly commercial value.Frequent sequential patterns mining is typically based on the support framework where a minimum support must be supplied to start the discovery process, and it is a deformation of the classic association rule mining algorithm Apriori. There are two major problems in these algorithms currently. Firstly, support threshold setting problem. Apriori-like algorithm must predefine a minimum support threshold to determine whether the candidate is a frequent pattern. However, the users generally have no accurate understanding about support threshold, and set it mainly through lots of temptations or rich experience in the mining process. What’s more, there are no uniform criteria for the setting of minimum support to follow. Secondly, the mining results are too large for user to understand. Specifically, if the sequence pattern P is frequent, all sub-sequence patterns of P are also frequent, which leads to the problem of exponential growth in the size of the result. And it also increases the difficulty for user to understand the result of sequence mode.On the basis of analysis of frequent sequential pattern mining algorithm, this paper introduced the thought of propositional logic into this field firstly to solve the problems mentioned above. The main contributions we make are as follows:1. We proposed a logic-based frequent sequential pattern mining algorithm which introduced the thought of logic into frequent pattern mining process. Based on the properties of propositional logic, it optimized the result sets greatly and filtered a lot of illogical results. Consequently, it reduced time consuming and improved the quality of the result. What’s more, the algorithm can run without knowing the minimum support threshold, reducing the dependence on it.2. In the filter stage, the approach that calculating the corresponding lower and upper bounds of subitemsets in the consequent part of an itemset was proposed. It could compress the range of the result set and exclude invalid candidate sequence. So it speeded up the mining process and improved the quality of result a lot.Experiments showed good performances of the proposed approach compared with traditional Apriori-like GSP algorithm. Through using logic rules, we optimized the result sets greatly and reduced the dependence on support threshold. In the same time, the size of the rule sets were compressed a lot, which greatly improved the understandability and usability of the results. So, the feasibility and advantages of the algorithm were demonstrated.
Keywords/Search Tags:Sequential pattern mining, Data mining, Propositional logic, Support threshold
PDF Full Text Request
Related items