Font Size: a A A

Research On Multi-task Active Learning Algorithm Based On Sample Distribution Structure

Posted on:2021-05-10Degree:MasterType:Thesis
Country:ChinaCandidate:Z ChangFull Text:PDF
GTID:2518306101475664Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,the Internet field has ushered in the wave of artificial intelligence,and multi-task learning has received great attention in many research fields.Different from the traditional single-task learning,multi-task learning can learn multiple related tasks at the same time to improve the generalization performance of each task.In multi-task learning,information sharing between related tasks can affect and promote the learning efficiency of each task.However,traditional multi-task learning methods always rely on sufficient labeled data to improve the learning efficiency of each task.In most real-life scenarios,acquiring labeled data is expensive and requires a lot of manpower and material resources.At the same time,the stacking of multiple task data will also generate data redundancy,increase the training scale of the task,and reduce the training efficiency of the model.In order to reduce the cost of data labeling and to mine data with high information quality from massive data,selection strategies are required to achieve this goal,and active learning is an effective way to solve such problems.In active learning,the training process is iterative.At each iteration,unlabeled samples are strategically selected,and the selected samples are labeled and added to the training set for model training.Under the same conditions,the active learning method process requires fewer labeled training samples than the traditional learning methods,but it can improve the model performance and achieve higher accuracy.This paper introduces active learning into multi-task learning,and proposes two multi-task active learning classification methods based on support vector machine classifiers and sample distribution structures.This paper measures the information amount of samples from the perspective of sample uncertainty and sample diversity.First,we propose a support vector-based uncertainty criterion,referred to as Classifier-Level Uncertainty(CLU).Secondly,we propose two diversity criteria based on clustering methods and partitioning methods,referred to as Partition-Based Diversity(PBD)and Clustering-Based Diversity(CBD),respectively.The uncertainty criterion selects support vectors that can determine the classification hyper-plane,ensuring the information value of selected samples.In order to preserve the distribution structure of data,the diversity criteria select samples with representative structural information through the clustering method and the partition method.The two diversity criteria are combined with the uncertainty criterion to form two different multi-task active learning methods.Finally,we conduct experiments to compare the two proposed methods with the existing active learning methods in text classification.Experimental results show that the proposed method outperforms the existing active learning methods with respect to various evaluation indexes.
Keywords/Search Tags:Multi-task Classification, Active Learning, Support Vector Machine
PDF Full Text Request
Related items