Research On Contrast Sequential Pattern Mining Based On Subsequence Distribution Variation

Posted on:2020-06-11

Degree:Master

Type:Thesis

Country:China

Candidate:Q Li

Full Text:PDF

GTID:2428330620951118

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Contrast sequential pattern mining is an important research task in data mining,which aims to discover the differences between different classes of sequence data.How to efficiently mine meaning and easily-to-analysis patterns from sequence data is a major problem that needs to be solved in current researches.At present,researchers have designed many algorithms for mining contrast sequential pattern.However,most algorithms are based on the number of occurrences or support frameworks,ignoring the effect of subsequence distribution on patterns.Although existing algorithm consider the location information of subsequences in emerging sequence pattern mining,it uses fixed location to identify the distribution differences of different subsequences in different classes of sequence data,i.e.,the subsequence pattern that appears before the given distinguishing location in one sequence dataset and after the same location in another sequence dataset.Without sufficient prior knowledge,it is difficult for users to set appropriate location thresholds.Since the distinguishing location is different for different subsequences,setting a fixed location threshold may ignore many meaningful patterns.Considering that a large amount of sequence data contains time tags,its time attribute is also a non-negligible factor in the analysis of sequence data.Designing an algorithm that can automatically analyze the time distribution difference of event will help decision makers make the right decision.In addition,with the generation of a large amount of biological data,it is an urgent problem to study the methods that can automatically analyze the differences of different classes of biological sequences.However,previous studies centered on contrast sequential pattern mining did not consider the effect of spatial location distribution of genes/amino acids on given biological sequences.In response to the above questions,the main contributions of this dissertation are as follows:(1)Proposed a contrast sequential pattern mining method based on subsequence time distribution variation and satisfying discreteness constraints.Based on the design of the suffix tree,the algorithm first maps all the suffix substrings generated by each sequence in the dataset to the each path of the tree.In this tree,the node is used to save the time information and the counts of the item.Then,each node in the tree is visited through the depth-first search method to mine patterns that satisfy the corresponding conditions.At the same time,a discreteness constraint for the time series is proposed to ensure the compactness of the subsequence time distribution.The experimental results on the user behavior datasets and online retail datasets show that the proposed algorithm can mine more meaningful patterns,and has better classification performance.(2)Proposed a method for mining contrast sequential patterns based on subsequence spatial location distribution from biological sequences.The algorithm maps each instance and all its suffix substrings of the dataset to each path of the tree and mines patterns satisfying the corresponding conditions in a depth-first manner.The difference from the contrast pattern tree based on the subsequence time distribution variation is that each node stores the location information and the counts of the item,and the performance of the pattern tree is further optimized.The experimental results show that it is meaningful to use the proposed pattern for the mining of biological sequences,and using the pattern as a classification feature can improve the classification performance of the algorithm.

Keywords/Search Tags:

Contrast sequential pattern, Subsequence distribution variation, Discreteness constraint, Classification

PDF Full Text Request

Related items

1	Constraint-based Sequential Pattern Mining And Its Applications
2	Research On Sequential Pattern Mining Algorithm Based On Constraints
3	Research On The Sequential Pattern Mining Algorithms Using Prefix-tree Structure
4	The Research Of Conditional Discriminative Sequential Pattern Mining Algorithm
5	Research Of Intrusion Detection Based On Sequential Pattern Mining
6	Multi-threshold Based Contrast Pattern Mining And Its Application In Classification Of Imbalanced Datasets
7	A Novel Classification Based On Sequential Pattern Mining In Videos
8	The Research And Application Of Multi-Dimensional Sequential Pattern In The Analysis Of Broadcast Listen Rate
9	The Research And Implementation Of Crucial Problems In Sequential Pattern Mining
10	Frequency Distribution Of Biological Subsequence And Classification Model For Tumor Subtype