Font Size: a A A

Research On Key Techniques Of Mining Negative Sequential Patterns Based On Non-occurring Items

Posted on:2019-08-05Degree:MasterType:Thesis
Country:ChinaCandidate:P QiuFull Text:PDF
GTID:2428330548487000Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Sequential pattern mining refers to the pattern of mining relative time or other patterns with high frequency.Different from positive sequential patterns?PSP?only consider occurring items,negative sequential patterns?NSP?also consider non-occurring items and can provide more comprehensive actionable information.Therefore,more and more people pay attention to it.However,most of the existing methods introduce so strict constraints that many meaningful patterns would be lost.For example,negative element constraint constrains that the smallest negative unit in a negative sequential candidates?NSC?is an element.In detail,the forms of s1=<??a,b??c,d?>and s2=<?a,b???c,d?>are allowed and the form of s3=<??a,b??c,?d?>is not allowed.But the form of s3 is more common than s1 and s2 in real life.In order to solve this problem,this paper removes the constraint and researches on mining negative sequential patterns based on non-occurring items.The key problems,including the definition complexity of negative containment,the cost of support computation etc.,are researched.In addition,this paper also researches on the problem of top-k NSP mining,the unfairness of using single minimum support to select NSP and many valuable NSP are implied in infrequent positive sequences?IPS?.The main contributions of this paper are summarized as follows:1.We propose a NSP mining algorithm based on non-occurring items--NegI-NSP.First,NegI-NSP algorithm proposes a loosely constrained mechanism,i.e.,realized NSP mining based on non-occurring items.Second,NegI-NSP algorithm proposes a definition of negative containment based on non-occurring items.The definition can cover all cases of whether a data sequence contains a negative sequence based on non-occurring items.Final,NegI-NSP algorithm proves the equations to calculate negative sequences'support.It only uses the information of corresponding PSP,and effectively improves the time efficiency of algorithm.2.We propose a top-k NSP mining algorithm based on non-occurring items TopkI-NSP.Although NegI-NSP algorithm realized mining NSP based on non-occurring items,it did not solve that because of limited professional knowledge,users are very difficult to set a rational minimum threshold and obtain an expected number of patterns.To solve this problem,we propose the TopkI-NSP algorithm.First,TopkI-NSP algorithm does not require users to set minimum support?ms?threshold and get the expected number of patterns.Second,TopkI-NSP algorithm considers the influence of PSP on NSP mining,and proposes corresponding strategies.Final,TopkI-NSP algorithm proposes a strategy of optimizing time efficiency.Experimental results show that the strategy is very effective.3.We propose a NSP mining algorithm based on 2-level multiple minimum supports--I-msNSP.NegI-NSP algorithm only found NSP from PSP,neglecting many valuable NSP can also be found from IPS.To solve this problem,we propose the I-msNSP algorithm.I-msNSP uses 2-level multiple minimum supports?2-LMMS?to constrain IPS.That is,according to the frequency of different single sequences in database,we set two reasonable ms for each item to select PSP and IPS.Experimental results show that I-msNSP algorithm can effectively mine NSP from IPS.4.We propose a NSP mining algorithm based on multilevel minimum supports--MLMS-NSP.NegI-NSP algorithm only uses single minimum support to select patterns,which is unfair to long size sequences.If the support was too high,a small number of long frequent sequences would be discovered;if the support was too low,a large number of short frequent sequences would be discovered,which would increase the difficulties for users to choose actionable sequential patterns.To solve this problem,we propose the MLMS-NSP algorithm.MLMS-NSP assigns different minimum supports to sequences with different sizes,which can be used to select more valuable NSP.Second,MLMS-NSP algorithm sets minimum support threshold to minie IPS.Final,the experimental results prove that the MLMS-NSP algorithm is very effective.
Keywords/Search Tags:negative sequence patterns mining, top-k negative sequence patterns, infrequent sequences, multi-level minimum support
PDF Full Text Request
Related items