Font Size: a A A

Research On Semi-supervised Learning Classification Algorithm Based On Mult-view

Posted on:2015-01-05Degree:MasterType:Thesis
Country:ChinaCandidate:P SunFull Text:PDF
GTID:2268330428997993Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Machine learning has been paid much attention as a popular research field incomputer science. In the past, learning machine learning mainly adopt two kinds oflearning pattern namely supervised learning and unsupervised learning, and datasetsconsist of low dimensions data (either totally labeled or totally unlabeled). However,with the development of machine learning and data acquisition techniques, datasetsbecome multiple dimensions with high relation among each attribute, plus few labeleddata and much unlabeled data. So the traditional machine learning methods has beenunable to learn effectively, and the coming thought-provoking problem is how to usethese kinds of data to learning efficient in various industries. Thereforesemi-supervised learning emerged that is able to combine a large number of unlabeledand few labeled data in learning.In recent years, with the sustaining development of variety technologies inmachine learning and data mining, semi-supervised learning has been greatly boostedin theory and practical application of machine learning. Semi-supervised learningresearch focuses mainly on design of learner with good performance in the case thatclass label is lacking for most of data in training dataset. The process ofsemi-supervised learning is using a little labeled data combined with a vast array ofunlabeled data to generate a model which has a good learning performance.Semi-supervised learning Naive Bayes classifier, as an excellent classificationalgorithm because of its simple, fast and high accuracy rate and other characteristics,has been widely used in classification tasks. Multi-view semi-supervised learningmethods is an important method, at first attribute sets of data will be divided intomany subsets, and each subset generate one classifier, then each classifier provide new labeled data for other classifiers, classifiers learn collaboratively. However, thereare still some unsolved problems in the learning of multi-view, for example, themethod of choosing high confidence data from unlabeled dataset to label and add intolabeled datasets; the problem of selecting appropriate quantity of data in the processmentioned above. Since the traditional Multi-view semi-supervised learningclassification algorithm does not take the performance of the individual classifiersinto consideration, each classifiers choose the average amount of data to label in eachiteration. Therefore, each classifier can’t play their best in classification.According to the problems discussed above, this paper proposed two confidenceestimating methods namely, Max Distinction and KNearest and their calculationformulas, and had experiment on selected certain percentage of data in the UCIdatasets, compared Macro-Recall, Macro-Precision and time, and proved theeffectiveness of Max Distinction. Then proposed a novel weight adjusting two-viewsemi-supervised classification method, improved the traditional two viewsemi-supervised learning classification algorithm. In the novel algorithm, eachclassifier select data from unlabeled dataset and insert into labeled datasets accordingto their classification ability, so that they could play their real weight in the learningprocess. The experimental results show that this algorithm can improve theperformance of Macro-Recall and Macro-Precision of the traditional two-viewsemi-supervised study classification algorithm.
Keywords/Search Tags:Semi-supervised learning, Multi-view learning, Ensemble learning, Naive bayes, Classification
PDF Full Text Request
Related items