Font Size: a A A

The Research On Category Knowledge Discovery Algorithm For Incomplete Data Sets

Posted on:2012-12-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:R H QiFull Text:PDF
GTID:1118330335954657Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Category knowledge discovery is the fundamental task of data mining and one of the most important goals in knowledge discovery. According to statistics, the understanding of incomplete data in machine learning and data mining application process need to spend a lot of time and effort. So the processing of incomplete data from real world should to be taken seriously an important issue in classification knowledge discovery. As the starting point to explore the classification of incomplete data, this paper focus on the full use of hidden information in incomplete data sets and efficient way to improve data mining.The detailed contents of the research are as follows:(1) The weighted conservative inference rule based on correlation coefficient is proposed. This rule tries to make use of the correlation coefficient to quantitative analysis the relationship between the attributes and the categories. Based on this idea, the weighted Naive Credal classifier is proposed and tested on the international public data sets. Compared with Naive Bayes classifier and Naive Credal classifier, this algorithm has better learning performance. On some datasets, the weighted Naive Credal classifier is comparable with the support vector machine. Compared with other existing classification algorithms, the weighted Naive Credal classifier performs better owning to the full use of the hidden information in incomplete data.(2) This paper presents a two-stage semi-supervised weighted Naive Credal classification model. For the ignoring of the implicit information in incomplete data and the high complexity of current semi-supervised classifiers, in this model the semi-supervised classification process is divided into two weighted Naive Credal classification stages. Compared with transductive Support Vector Machine (TSVM), this algorithm has lower time complexity and almost the same accuracy.(3) This paper presents a Naive Credal classifier based on relaxed conservative inference rule for incomplete data. For the low proportion of determinate classified samples, the definition of interval advantages is relaxed in this model. Compared with Naive Credal classifier and weighted Naive Credal classifier, this algorithm effectively increases the proportion of determinate classified samples and almost the same accuracy. Overall this algorithm has better classification performance than Naive Bayes classifier, nearest neighbor method, Naive Credal classifier and weighted Naive Credal classifier. But if this algorithm has better performance than support vector machine depends on their performance on different data sets.Finally, the weighted Naive Credal classifier, the two-stage semi-supervised weighted Naive Credal classifier and the relaxed Conservative Inference Rule based weighted Naive Credal classifier are applied to style identification dataset. The validity of the algorithm is verified by the better experimental results compared with the main of existing classification algorithms.
Keywords/Search Tags:Data Mining, Incomplete Data, Classification Algorithm, Knowledge Discovery
PDF Full Text Request
Related items