Font Size: a A A

Research On Adaptive Sampling Methods Based On Label Noise Filtering

Posted on:2022-04-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhengFull Text:PDF
GTID:2518306575467084Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Nowadays,machine learning has been widely used in all aspects of production and life.As we all know,data is the raw power of machine learning.How to screen and process data correctly has always been a popular direction in academic and industrial research fields.In order to improve data quality,that is,to improve the effectiveness and efficiency of data classification,scholars have proposed many data processing methods with different performance,such as reducing data imbalance,processing data noise,and sampling large-scale data.However,in order to reduce data imbalance and deal with noisy data,large-scale data sampling is usually an independent sampling algorithm.The data in real scenes is usually very complex and requires multiple sampling algorithms to process or filter.However,many algorithms and repeated processings are used on the same data,it may also cause the subsequent algorithms to fail to obtain sufficient information,and the overall effect of data processing and sampling will be affected.Therefore,this thesis proposes a general sampling method based on a completely random forest.This method can reduce the data imbalance and deal with noisy data well in the data sampling process.Furthermore,an adaptive learning parameter search algorithm is proposed to improve the above method.Finally,a self-stepping data sampling system is designed to integrate these two algorithms.The main work of this thesis is as follows:1.By studying the structure of a completely random forest,a multifunctional sampling algorithm is proposed.Firstly,the concept of node labels is defined,the characteristics of noise points and redundant points are discovered through the results of the voting method.Meanwhile,the rules of these two types of points are summarized.Finally,the points marked as noise and redundant points are filtered to realize integrated label noise filtering.Hence,data imbalance is reduced and the number of samples is compressed.2.By studying voting methods and promotion learning methods,an adaptive learning algorithm combined with promotion learning is presented.In order to make the completely random space sampling method completely parameterless,combined with the Boosting method in the tree construction process,the algorithm can determine the number of trees adaptively,and define the stable state of the algorithm too.Thereby,complete automation of sampling is achieved.The random space sampling method improves the speed of the algorithm and improves the stability of the algorithm.3.An adaptive integrated data sampling system is implemented.It is combined with an adaptive fully random spatial sampling algorithm to perform label filtering,unbalance processing,and sample compression on the data input to the system.
Keywords/Search Tags:data sampling, noise filtering, complete random forest, adaptive
PDF Full Text Request
Related items