Logic-based Frequent Sequential Pattern Mining Algorithm

Posted on:2015-01-21

Degree:Master

Type:Thesis

Country:China

Candidate:J Feng

Full Text:PDF

GTID:2308330461488646

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Frequent sequential pattern mining is an important research field in data mining. Sequence data is very common in our daily life, which has significantly commercial value.Frequent sequential patterns mining is typically based on the support framework where a minimum support must be supplied to start the discovery process, and it is a deformation of the classic association rule mining algorithm Apriori. There are two major problems in these algorithms currently. Firstly, support threshold setting problem. Apriori-like algorithm must predefine a minimum support threshold to determine whether the candidate is a frequent pattern. However, the users generally have no accurate understanding about support threshold, and set it mainly through lots of temptations or rich experience in the mining process. What’s more, there are no uniform criteria for the setting of minimum support to follow. Secondly, the mining results are too large for user to understand. Specifically, if the sequence pattern P is frequent, all sub-sequence patterns of P are also frequent, which leads to the problem of exponential growth in the size of the result. And it also increases the difficulty for user to understand the result of sequence mode.On the basis of analysis of frequent sequential pattern mining algorithm, this paper introduced the thought of propositional logic into this field firstly to solve the problems mentioned above. The main contributions we make are as follows:1. We proposed a logic-based frequent sequential pattern mining algorithm which introduced the thought of logic into frequent pattern mining process. Based on the properties of propositional logic, it optimized the result sets greatly and filtered a lot of illogical results. Consequently, it reduced time consuming and improved the quality of the result. What’s more, the algorithm can run without knowing the minimum support threshold, reducing the dependence on it.2. In the filter stage, the approach that calculating the corresponding lower and upper bounds of subitemsets in the consequent part of an itemset was proposed. It could compress the range of the result set and exclude invalid candidate sequence. So it speeded up the mining process and improved the quality of result a lot.Experiments showed good performances of the proposed approach compared with traditional Apriori-like GSP algorithm. Through using logic rules, we optimized the result sets greatly and reduced the dependence on support threshold. In the same time, the size of the rule sets were compressed a lot, which greatly improved the understandability and usability of the results. So, the feasibility and advantages of the algorithm were demonstrated.

Keywords/Search Tags:

Sequential pattern mining, Data mining, Propositional logic, Support threshold

PDF Full Text Request

Related items

1	Constraint-based Sequential Pattern Mining And Its Applications
2	Keyword Extraction Based On Sequential Pattern Mining
3	A Novel Classification Based On Sequential Pattern Mining In Videos
4	The Research Of Mining Access Sequential Pattern In WebLog
5	Design And Implementation Of The Phone Virus System Based On Sequential Patterns Mining
6	Research And Application Of Mining Access Sequential Pattern In Weblog
7	Research On Sequential Pattern Mining Algorithm In Recommendation Of Hypertensive Drugs
8	Reserch On The Sequence Mining Algorithm And Its Application In User Behavior Analysis
9	Web Log Mining And Its Application Based On Sequential Pattern
10	Research And Application Of Projection Position-Based Sequential Pattern Mining Algorithm