Font Size: a A A

Research On Key Techniques Of Mining Repetitive Positive And Negative Sequential Patterns

Posted on:2017-10-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y S GongFull Text:PDF
GTID:2348330491457957Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Mining sequential patterns is an important branch of data mining and knowledge discovery.It aims to find interesting sequences and provide support for theoretical analysis and practical applications.Unlike traditional sequential patterns,negative sequential patterns(NSP)provide a special perspective of analyzing sequential patterns,which focuses on non-occurring but interesting behaviors.It considers both occurring and non-occurring relationships of sequential elements.Non-occurring elements refer to those behaviors that should occur but do not for some reason.They are hidden,but are widely seen in behavioral applications in consumer behavior analysis,diagnosis of diseases and health insurance.However,all the existing NSP mining algorithms we have found so far have not considered the repetitive sequential pattern(RSP)mining problem,in which RSP are repetitive not only across different sequences but also within a sequence.If a data sequence contains a negative sequence repeatedly,it then causes the negative sequence to have a high repetitive support.Hence it can help analysts capture more useful information.Therefore,this paper studies the repetitive sequential pattern mining systematically and explores key techniques of finding repetitive positive sequential patterns(RPSP)and repetitive negative sequential patterns(RNSP)efficiently.In addition,this paper studies the problems of existing NSP mining methods,proposes a fast NSP mining approach and an efficient NSP mining approach from both frequent and infrequent positive sequential patterns.Specific contents are as follows:1.Mining repetitive positive sequential patternsIn this paper,we propose an algorithm,RptGSP,to efficiently mine repetitive patterns in sequence database by improving a classic algorithm GSP.And meanwhile,we provide a method to clearly determine the times that a sequence appears in a data sequence.This method takes RSP into consideration which can help the analyst to capture more useful information.2.Mining repetitive negative sequential patternsNow,very limited methods focus on mining repetitive NSP(RNSP).So this paper defines the repetitive negative containment problem and proposes an efficient algorithm,called e-RNSP,to mine RNSP.E-RNSP calculates the repetitive support only by involving the identified RPSP,without re-scanning databases.After mining RPSP by RptGSP,the NSC repetitive supports are computed by only utilizing a NSC's corresponding RPSP information.Intensive experiments clearly show that,e-RNSP can efficiently catch more negative patterns than other approaches.3.A fast negative sequential patterns mining methodIn this paper,we propose a novel and efficient data structure,bitmap,a fast method to obtain the support of NSC and a corresponding fast NSP mining algorithm,f-NSP.F-NSP uses bitmap to store the PSP's information and then obtain the support of NSC only by bitwise operations,which is greatly faster than the hash method in e-NSP.But f-NSP would consume more storage space than e-NSP when PSPs' supports are less than the support threshold sdsup,a value by our theoretical storage space analysis.So we further propose a self-adaption storage strategy and a corresponding algorithm f-NSP+ to overcome this deficiency.Experimental results show that f-NSP is tens to hundreds times faster than e-NSP,but in some cases,e-NSP still has its advantages.4.An Efficient NSP mining method from infrequent positive sequencesIn general,NSP mining methods use positive sequential patterns(PSP)to generate the NSC,neglecting a negative pattern can also be found from infrequent positive sequences(IPS).This paper proposes an efficient algorithm called e-NSPFI for mining NSP from both frequent and infrequent sequential patterns efficiently.Nevertheless,the PSP mining procedure will generate a huge number of IPS,so a vital task is how to select the available IPS.Then we propose a strategy to restrict which IPS are available to generate NSC,and give a storage optimization method to hold this IPS information.
Keywords/Search Tags:repetitive sequential patterns, positive sequential patterns, negative sequential patterns, infrequent sequences
PDF Full Text Request
Related items