Font Size: a A A

Research Of Weighted Sequential Pattern With Item Interval Mining Algorithms

Posted on:2013-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:S LiFull Text:PDF
GTID:2248330392454760Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Sequential pattern mining is an important research topic in the field of datamining; it has broad application in Customer purchase pattern analysis, DNA sequenceanalysis, Web access pattern analysis, Environment monitoring, recommended system,etc. Through introducing weight constraint into the mining process, weightedsequential pattern mining can extract the patterns those users are really interested in.The existing weighted sequential pattern mining algorithms did not consider theweight of items and the time-interval of sequence elements synthecally and the miningresult did not contain the item interval information users care; The mining processneeded to scan the sequence database many times or to construct numerousintermediate database; Most previous algorithms adopted the exact matching ideology,but the mining efficiency would decrease rapiddly when mining from noisy sequencedatabase with lengthy sequences. To solve these above questions, this artical doesresearch into weighted sequential pattern mining algorithm from three aspects.Firstly, this paper proposes a memory-based weighted closed sequential algorithm.The algorithm defines a novel sequence weighting approach considering both theweight of sequence items and the time-interval of data alements; also we define animproved index set based on time-interval to realize the weighted closed sequentialpattern mining without generating candidate sets or structuring projected database.Secondly, a generalized weighted closed sequential mining algoritm with iteminterval is proposed. This algorithm inserts pseudo items which are converted fromitem interval to obtain equal extended sequence database; it defines item-intervalconstraints, which are relatived to the item weight, to prune the mining patterns.Through doing this, the algorithm avoids mining the patterns which users are notinterested in and shortens the running time. It adopts histogarm statistic pattern to getthe standardization description to item interval of the mining patterns, making themining sequences include the item interval information which is valuable to userdecision. Finally, the paper presents an approximate weighted sequential pattern miningalgorithm. The algorithm defines weighting similarity dependencing on the itemweight and item interval, and sequences are clustered on the basis of the weightedsimilarity. It adopts multiple alignments to get consensus patterns from each cluster.When sequence database contains lengthy sequences and noise, the tradition algorithmadopts exact matching to produce more short patterns but less long patterns which areshared by many sequences.
Keywords/Search Tags:weighted sequential pattern, closed sequential pattern, approximatesequential pattern, memory index, item interval
PDF Full Text Request
Related items