Research And Application Of Random Forest Algorithm Ased On Semi-supervised Learning

Posted on:2014-07-07

Degree:Master

Type:Thesis

Country:China

Candidate:X L Liu

Full Text:PDF

GTID:2268330401984061

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Machine learning is one of the core topics in the artificial intelligence, whichcontains three important areas: supervised, semi-supervised and unsupervised learning.Supervised learning needs labeled examples acting as training, in order to ensure itsgeneralization capacity. Unsupervised learning does not need labeled examples, but itdoes not ensure the precision of the model. With the development of computerapplication technology and the level of enterprise informationization construction, thetraditional means of quality control can not meet the needs of actual production. Withthe introduction of new detection technology, unlabeled data are readily available butlabeled data are fairly expensive to obtain because they require human effort.Therefore, the semi-supervised learning that combines large amount of unlabeled datawith limited number of labeled ones becomes a hot topic.The traditional classification algorithm is difficult to obtain precise classificationmodel on a small amount of labeled examples. Therefore, it is difficult to play a rolein the practical application. Semi-supervised learning is introduced into the traditionalclassification algorithm, and it can attempt to exploit the unlabeled examples withadditional information to guide the establishment of classification model and improvethe classification performance. It is validates by experiments that the semi-supervisedlearning has important theoretical and practical value in the near-infrared spectral dataclassification. It can reduce cost and time brought by labeling examples, and canimprove classification accuracy.The major innovative contributions are as follows:(1) Propose new depuration-based semi-supervised random forest. Thesemi-supervised learning and data depuration strategy are introduced into thetraditional random forest. The label of unlabeled example is predicted by theconcomitant classifier set, the example that its confidence is greater than the defaultthreshold is added to the training set. In order to prevent mislabeled example affecting the performance of the classifiers, the algorithm convergence and data depurationstrategy are introduced into training set. The algorithm convergence ensure theclassifier performance gradually increase, and the data depuration use RemoveOnlymethod to reduce mislabeled example. It is validated by experiments that thealgorithm has better generalization capacity and solve the difficulties in modelingunder the conditions of insufficient labeled samples.(2) The proposed algorithm is applied to the practical application of sensoryevaluation of cigarette product using NIR spectroscopy. It is validated by contrastexperiment that this algorithm can be estimated better performance and robustclassification model on the NIR data, and had practical value in engineeringapplication. It has greater guiding significance in the actual production.

Keywords/Search Tags:

Semi-supervised Learning, Classification, Random Forest, Near-infrared, Sensory Evaluation

PDF Full Text Request

Related items

1	Research On A Semi-supervised Random Forest Classification Algorithm And Its Parallelization
2	The Research Of Infrared Characteristics Identification On Banknotes Discrimination Based On Random Forest
3	Selection And Classification Of Unbalanced Data Based On Semi - Supervised And Integrated Learning
4	Semi Supervised Classification Of Polarimetric SAR Based On Sparse Graphs
5	Chinese Question Classification, Based On Semi-supervised Learning
6	Research On The Application Of Geometric Information In The Semi-supervised Learning
7	Research Of Reliable Semi-supervised Classification
8	Research On Semi-supervised Clustering And Classification Algorithm
9	Research On Progressively Semi-supervised Text Classification Based On Markov Random Walk
10	Robust Semi-supervised Classification Method Search For Noisy Labels Based On Self-paced Learning