Font Size: a A A

Research On Active Learning In Regression Problems

Posted on:2021-03-10Degree:MasterType:Thesis
Country:ChinaCandidate:Z A LiuFull Text:PDF
GTID:2518306107960609Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Regression is a type of machine learning problem.Labeled samples are very important for training regression models.However,in some practical applications,the original samples are easy to obtain,but it is very difficult to label them with true labels.For example,the labeling process usually takes a lot of manpower,material resources or time.For such regression problems,applying active learning can effectively reduce the cost of labeling.However,among the existing active learning researches,most of them only focus on classification problems,and few on regression problems.This thesis focuses on the offline pool-based active learning regression problem(ALR),that is,given a sample pool,how to select a few valuable samples from it to label,so that the regression model trained from them can achieve the best possible performance.This thesis first compares the unsupervised ALR algorithms with the supervised ALR algorithms,and points out some advantages of the unsupervised ALR.Secondly,this thesis establishes a mathematical model for unsupervised ALR algorithms,and proposes a new indicator that can predict the accuracy of the regression model without any true label information.Thirdly,this thesis migrates the three essential criteria: "diversity","representativeness" and "informativeness" in the supervised ALR to the unsupervised ALR,and theoretically explains them using the proposed mathematical model and the new indicator.Next,this thesis proposes a framework for optimizing the set of candidate samples in unsupervised ALR.The framework uses an alternating optimization approach to split the multi-objective optimization problem into multiple single-objective optimization problem.Based on this framework,two new unsupervised active learning regression algorithms,i RDM and IRD,are proposed.The i RDM algorithm measures and integrates "diversity" and "representativeness".The IRD algorithm not only considers "diversity" and "representativeness",but also measures "informativeness" for the linear regression model,and integrates it into the objective function.At the end of this thesis,a large number of experiments were performed on 12 public regression datasets.These datasets cover multiple practical application areas.For each dataset,the proposed two unsupervised ALR algorithms(i RDM and IRD),and seven state-of-the-art ALR algorithms were implemented by MATLAB2019 and were used in linear regression(Ridge)and kernel function regression(RBF SVR).The results showed that the two new proposed unsupervised ALR algorithms(i RDM and IRD)performed better and more stable than the existing unsupervised ALR algorithms.Furthermore,they even performed better than supervised ALR algorithms when the number of labeled samples is very small.It was also verified that using these two algorithms to select a few samples to label at the beginning for a supervised ALR algorithm can improve its performance.The mathematical model and the evaluation indicator proposed in this thesis for unsupervised ALR can provide theoretical support and new ideas for subsequent researches on unsupervised ALR.The two unsupervised active learning regression algorithms proposed in this thesis reasonably measure and integrate the three essential criteria: "diversity","representativeness",and "informativeness".Compared with the existing ALR algorithms,they can reduce the labeling effort more effectively.They can also be used in any supervised ALR to further improve their performance by generating a better initial regression model.
Keywords/Search Tags:Active learning, Unsupervised learning, Linear regression, Kernel function
PDF Full Text Request
Related items