Font Size: a A A

Study And Implementation On Techniques Of Direct Discriminative Subsequence Mining For Classifying Uncertain Data

Posted on:2016-03-30Degree:MasterType:Thesis
Country:ChinaCandidate:R GanFull Text:PDF
GTID:2428330542954608Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In traditional data classification,we usually need to mine frequent sequence with high discriminative power as the rules for classification,for providing users information with high potential value.In recent years,with the rapid growth of the uncertain data processing requirements,mining discriminative patterns on uncertain data sets are drawing more and more attention.However,compared with the traditional database,there exists a large number of possible worlds in uncertain data,so the mining process becomes much more complicated.At present,researches on mining discriminative patterns on uncertain data sets is limited to mining discriminative itemsets as the form of the patterns,and the blank of mining discriminative sub-sequences still need to be filled,that is exactly what we focus on in this study.To solve the problems mentioned above,this thesis propose a new algorithm framework,designed to deal with discriminative sequence mining problem on uncertain data sets.Different from the researches before,this article adopts a direct mining process,which effectively avoid the bottlenecks of calculation encountered in both the steps of feature generation and feature selection.In addition,the algorithm mines all the probabilistic frequent closed sequences as a result set,all patterns in the result set are closed,which makes a guarantee that the result set will be concise and hold enough support information as well.The algorithm also utilize the information gain and the expected confidence as the measurement of the distinguish ability of the probabilistic frequent closed sequences,and then we combine the result set we mined with certain suitable classification algorithm to complete the classification of the data set.This framework mainly includes the following three parts:(1)a prefix span based enumeration method for sub-sequences,then the problem can be converted to frequent closed pattern mining problem.(2)merge the measurement of discriminative power of a sequence to the process of sequence mining,meanwhile we proposed efficient pruning strategies to reduce the search space.(3)Finish the classification process using the result set above combined with some efficient algorithm for classification.This thesis conducts a large number of experiments on both real datasets and synthetic datasets,the results of which show that the framework and algorithm proposed above can hold high efficiency and good scalability,they can ensure a high classification accuracy as well.
Keywords/Search Tags:Uncertain data, Discriminative sequence mining, Direct method, Classification
PDF Full Text Request
Related items