Font Size: a A A

Research And Application Of Random Forest Algorithm Ased On Semi-supervised Learning

Posted on:2014-07-07Degree:MasterType:Thesis
Country:ChinaCandidate:X L LiuFull Text:PDF
GTID:2268330401984061Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Machine learning is one of the core topics in the artificial intelligence, whichcontains three important areas: supervised, semi-supervised and unsupervised learning.Supervised learning needs labeled examples acting as training, in order to ensure itsgeneralization capacity. Unsupervised learning does not need labeled examples, but itdoes not ensure the precision of the model. With the development of computerapplication technology and the level of enterprise informationization construction, thetraditional means of quality control can not meet the needs of actual production. Withthe introduction of new detection technology, unlabeled data are readily available butlabeled data are fairly expensive to obtain because they require human effort.Therefore, the semi-supervised learning that combines large amount of unlabeled datawith limited number of labeled ones becomes a hot topic.The traditional classification algorithm is difficult to obtain precise classificationmodel on a small amount of labeled examples. Therefore, it is difficult to play a rolein the practical application. Semi-supervised learning is introduced into the traditionalclassification algorithm, and it can attempt to exploit the unlabeled examples withadditional information to guide the establishment of classification model and improvethe classification performance. It is validates by experiments that the semi-supervisedlearning has important theoretical and practical value in the near-infrared spectral dataclassification. It can reduce cost and time brought by labeling examples, and canimprove classification accuracy.The major innovative contributions are as follows:(1) Propose new depuration-based semi-supervised random forest. Thesemi-supervised learning and data depuration strategy are introduced into thetraditional random forest. The label of unlabeled example is predicted by theconcomitant classifier set, the example that its confidence is greater than the defaultthreshold is added to the training set. In order to prevent mislabeled example affecting the performance of the classifiers, the algorithm convergence and data depurationstrategy are introduced into training set. The algorithm convergence ensure theclassifier performance gradually increase, and the data depuration use RemoveOnlymethod to reduce mislabeled example. It is validated by experiments that thealgorithm has better generalization capacity and solve the difficulties in modelingunder the conditions of insufficient labeled samples.(2) The proposed algorithm is applied to the practical application of sensoryevaluation of cigarette product using NIR spectroscopy. It is validated by contrastexperiment that this algorithm can be estimated better performance and robustclassification model on the NIR data, and had practical value in engineeringapplication. It has greater guiding significance in the actual production.
Keywords/Search Tags:Semi-supervised Learning, Classification, Random Forest, Near-infrared, Sensory Evaluation
PDF Full Text Request
Related items