Font Size: a A A

Reserch On The Sequence Mining Algorithm And Its Application In User Behavior Analysis

Posted on:2015-12-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y XiaoFull Text:PDF
GTID:2298330467963523Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Nowadays, as the development of the information-exploring society, data mining technology comes to our life and is being used more and more widely. Sequential pattern mining, as one of the most important data mining technologies, aims at discovering the hidden patterns and potentially useful information from the mass of sequence data, and plays a key role in the Internet, the biomedicine, the financial field, the prediction of natural disaster and etc.In the circumstances of the bigger and bigger scale of the data-set, the exciting sequential pattern mining algorithms prove a low efficiency, because it’s too difficult for the user to find some concise useful patterns from a large number of obtained sequential patterns. So, how to get the useful key patterns from mass sequence data-set turns to be a problem in the sequential pattern mining field. To solve this problem, we proposed the two-stage sequential pattern mining algorithm based on Hidden Markov Model:First, it preprocesses the sequence data-set by using the sequence clustering algorithm based-on K-means algorithm; second, it uses the high utility patterns mining algorithm based on Hidden Markov Model to mine the useful patterns from each clusters. According to the experiments on different data-set, it is proved that the two-stage sequential pattern mining has better effect than the traditional algorithm. Then the algorithm is applied to the practice of user behavior analysis------malfunction behavior analysis for user application performance on IaaS cloud platform.Firstly, this thesis in-depth study of the typical sequential pattern mining algorithms, comparative analysis of the pros and cons of different algorithms and their application scenarios. Against the sequence similarity measure, it proposed sequence similarity measure based on the sequence edit distance, and on this basis, a sequence pre-clustering algorithm based on K-means clustering algorithm was proposed to preprocesse data-set. Secondly, to solve the problems of invalidity on finding useful patterns from mass data, it presented the usefulness measurement indication of pattern to evaluate the effectiveness of sequence patterns. Then, the high utility pattern mining algorithm base on Hidden Markov Model was raised to mining useful sequence patterns in clusters. Combined the above two algorithms, the two-stage sequence pattern mining algorithm based on the Hidden Markov Model was came up with by taking the two stages---clustering and mining----to get the useful sequence patterns sets. Finally, the two-stage sequence pattern mining algorithm based on the Hidden Markov Model was put into practice in the field of malfunction behavior analysis for user application performance on IaaS cloud platform, and the results shows that it’s an effective way to solve the issue of malfunction localization and prediction on IaaS cloud platform.
Keywords/Search Tags:Hidden Markov Model, High Utility Sequential Pattern, Sequential Pattern Mining, Sequence Clustering, Data Mining
PDF Full Text Request
Related items