Font Size: a A A

Improvement And Research On Apriori-based Algorithm Of Sequential Patterns Mining

Posted on:2016-11-08Degree:MasterType:Thesis
Country:ChinaCandidate:G N ZhengFull Text:PDF
GTID:2308330464967958Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Most of Apriori-based algorithm of sequential patterns mining, directly or indirectly, has used the nature of Apriori:each nonempty sequence of sequential patterns are sequential patterns. However, The generation of candidate sequence set in self-joining and pruning operation is very complicated and it Generates a large number of candidate sequence sets to produce combinatorial explosion, especially when the sequences are large, causing the algorithm VALID.Therefore, using such a nature to study classic AprioriAll algorithm and improve it has a lot of value both in theoretical research and practical application.This paper starts from the comparison decrease of self-joining and pruning operation and times of scanning database data to improve the existing main algorithm and put forward a new strategy. Through the study of the Apriori and some improved algorithms of association rule, we transplant optimization strategy of Apriori algorithm to the Apriori-based algorithm.Through changing the condition and process in connection phase, making frequent k-length sequences also sorted according to the dictionary, you can take advantage of the orderly sequences to reduce the number of comparative judgments in self-joining and pruning stage and improve the efficiency of the algorithm.Using the sub-sequence generation rules to delete frequent sequence set before the connection operation and design a new efficient algorithm.With the comprehensive utilization of all improvement strategies in association rules and sequential patterns mining, we put forward three kinds of strategies connect, delete and cut, which are feasible in theory, meanwhile we make a detailed quantitative analysis on space and time efficiency of the algorithm. Using JAVA language to achieve source program of improved algorithm and using the IBM synthetic data generator to generate experimental data. With operations on different improvement strategies in the same environment, realizing the visualization of data through the Echarts plugin. The experimental results show that the improved algorithm is much more efficient.
Keywords/Search Tags:Apriori, Sequential patterns mining, Self-joining, Pruning, Generation rules of sub-sequence, Data visualization
PDF Full Text Request
Related items