Font Size: a A A

The Study Of Sequence Clustering Mining Algorithm Based On Sequence Pattern

Posted on:2011-12-19Degree:MasterType:Thesis
Country:ChinaCandidate:T X YangFull Text:PDF
GTID:2178330338977500Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Sequential pattern mining is to discover hidden and interesting sequencerelationships between events from large sequence databases, and to tap out the highfrequent sequential patterns of time-based or other sequences. Sequence patternmining makes up for the shortcomings that association rule mining does not reflect thechronological order relevance between events. Sequential pattern mining has beenwidely applied in many fields, such as Customer Buying Behavior Analysis, DNASequence Pattern Analysis and etc.This paper focuses on the problem that sequential database has been furtherclustered by using K-means algorithm on the basis of the results of sequential patternmining. In the paper, a new algorithm named K-SPAM(K-means algorithm ofsequence pattern mining based on the Huffman Method) is proposed on the basis ofHuffman tree's structure ideas. This algorithm addresses this shortcoming that it couldlead to the instability of clustering results to select the initial center randomly in thek-means clustering algorithm.K-SPAM realizes the function that data sequences containing similar patterns areclustered. Huffman's ideas are adopted to select initial center in the k-meansalgorithm. As a result, the number of iterations are reduced and the stability ofclustering is also improved. Finally, the K-SPAM algorithm has been compared withk-means algorithm about the clustering results by the experimental method, to furtherconfirm the advantages of the K-SPAM algorithm.
Keywords/Search Tags:data mining, sequence pattern, k-means, dissimilarity, Huffman algorithm
PDF Full Text Request
Related items