Font Size: a A A

Research On Active Learning Algorithm For Multi-label Classification

Posted on:2020-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:C C LiFull Text:PDF
GTID:2428330575995036Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Active learning has attracted the attention of the industry in the fields of machine learning,data mining,pattern recognition and so on.It mainly to solve the problem of large overhead of tag instances.The active learning method firstly learns the classifier using a small number of tagged instance set,then extracts the information from the unlabeled data according to the instance selection algorithm,and finally labels the selected instance and updates the classifier by the expert labeling system.The core issue of active learning is how to design an instance selection algorithm to select instances with good quality and quantity.At present,the research of active learning still focuses on the single-label classification problem.Multi-label classification is a common problem in data analysis.The labeling of multi-label instances usually takes more time and costs than the labeling of single-label instances.In the multi-label classification problem,how to more accurately find the labels set which is more suitable for classification and add is added to the attribute space,which is the key to improve the performance of the multi-label classification algorithm.In addition,the existing instance selection algorithm considers less noise data,and its selection strategy is relatively simple.In view of the above problems,this paper has carried out research from the two aspects of instance selection algorithm and multi-label attribute selection.The contributions of this paper are as follows:(1)Aiming at the problem that the instance information measurement method is difficult,an active learning algorithm based on uncertainty sampling is proposed.Firstly,this paper uses multiple binary categorization support vector machine classifiers to separate the positive and negative labels of multiple label instances.The distance between positive and negative label values is called separation margin.The instance selection algorithm considers the instance with the smallest value of the separation margin in the classification result as an instance of high uncertainty and rich information.In this paper,an active learning algorithm based on the separation margin of the bias term is proposed.When selecting an instance,the bias term is used as a factor to measure the separation margin,and the value of the separation margin tends to be biased and non-noise.Secondly,on this basis,the algorithm uses the standard deviation method to measure the dispersion of the instance and select the high dispersion instance.Finally,the effectiveness of the algorithm is proved by experimental results on multiple multi-labeled data sets.(2)Aiming at the classification error of the classifier,the instance selection algorithm is very likely to mis-select the instance and consider the correlation between the labels.A multi-label active learning algorithm based on maximum correlation is proposed.Firstly,use the correlation between the instance and the tag value to measure the uncertainty of the instance and use it with the existing minimum confidence strategy.Secondly,the algorithm uses an improved two-layer multi-label model to select the label value extension attribute space above the threshold in the classification result of the base classifier.Finally,the improved two-layer multi-label model is combined with the instance selection algorithm to improve the performance of the final classifier.Similarly,the experimental results on multiple multi-labeled data sets demonstrate the effectiveness of the algorithm.
Keywords/Search Tags:Active Learning, Multi-label Learning, SVM, Uncertainty Sampling, Binary Relevance
PDF Full Text Request
Related items