Font Size: a A A

Semi-supervised Learning Based On Information Theory And Functional Dependency Rules Of Probability

Posted on:2014-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y YanFull Text:PDF
GTID:2248330395497264Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Machine learning is the key content of the data mining, and with the rapiddevelopment of computer and information technology, the ability of people tocollect and store data also greatly improved, which lead us to accumulatemassive amounts of data in the field of life and research, where the vast majorityof unlabeled data.Compared with unlabeled data,Mark data acquisition issweated and costly. In order to guarantee the hypothesis of the generalizationability,supervised learning training requires a lot of tag data. Unsupervisedlearning although do not need to mark data, but the model is too rough. No tagdata can be are more likely to get, and can describe he space geometry betterincluding data space. Therefore, mark semi-supervised learning, which makeuse of a lot no mark data and a small amount of data become a research hotspot.Naive Bayesian classifier for the efficiency of calculation become the most widely Bayesian classifier. This article proposed to enhance performance naiveBayesian classifier learning algorithm-FFDC,in the semi-supervised learningframework. There is no essential conflict between the test set and Naive Bayesclassifier which learns from the training set, so the classifier labeled test setadded to the training set, and take full advantage of effective informationcontained in the test set,and the naive Bayes classifier will gradually increase theaccuracy of the forecasts. In order to avoid introducing these initial unlabeleddata into the noise transmission problems due to the training set,This paper willextend the concept of independent information gain in information theory to theNaive Bayes classifier,and then selects the date which is greater than the averageof independent information gain of test set,as optimal sequence and added to thetraining set. In addition, samples containing redundant properties increases thecomplexity of the modeling. In this paper use the equivalence relation which isbetween the association rules and functional dependencies under certainconditions, first, to mine the implied association rules of the properties andconvert into a functional dependency,and then to deduce functionaldependency rules by Armstrong axioms to find and delete the redundant properties,thus the complexity of calculation is exponentially decreased,whenmaking modeling.FFDC algorithm choose to use a different set of training for each test datarecord to improve the predictive accuracy of the classifier.An Empirical Study ofthe10sets of data in the UCI machine learning repository show, FFDCalgorithms on generalization and probability performance is significantly betterthan the other semi-supervised learning algorithm.
Keywords/Search Tags:Machine Learning, Semi-Supervised Learning, Unlabeled Data, MissingValues, Naive Bayes
PDF Full Text Request
Related items