Font Size: a A A

Research On Label Noise Detection Methods Based On Multi-granularity

Posted on:2021-01-26Degree:MasterType:Thesis
Country:ChinaCandidate:X LiangFull Text:PDF
GTID:2428330614458452Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the field of machine learning,accurate classification of data is an important step.The more accurately the labels are classified,the more valuable the results will be.The purpose of the classification algorithm is to extract the implicit knowledge from the samples with known labels,thus creating a model called classifier to predict the labels of samples with unknown labels.The accuracy of classification is affected by the samples used to construct the classifier.However,in the real world,data sets generally contain label noise.Label noise can reduce the generalization ability of classifier,increase the complexity of model,and cause the distortion of observation frequency.To achieve good classification performance,label noise needs to be processed.In this thesis,two methods of label noise detection based on multi-granularity are proposed by using the idea of multi-granularity,and experiments are carried out to prove their effectiveness.Finally,the two methods are integrated into the label noise detection system based on multi-granularity.The research results are as follows:Firstly,inspired by the relative density method,a relative density noise detection method based on multi-granularity is proposed.First of all,a simple and fast method of the relative density based on centroid is defined,and the distance between each sample and its homogeneous centroid and heterogeneous centroid can be calculated.The running time of the algorithm is optimized.Then,granular computing is introduced into the relative density based on the centroid by using the granulation technology of clustering,and it is investigated whether the sample is label noise as a whole.Experiments show that this method can effectively detect label noise in small data sets.Secondly,inspired by the random space division,a random space division noise detection method based on multi-granularity is proposed.First of all,the complete random tree with labeled nodes is established.In the result of random space division,the leaf node and its parent node are regarded as the child granularity space and the parent granularity space by using the idea of multi-granularity.The classification of samples is analyzed by comparing the two spatial information,and it is investigated whether the sample is label noise as a whole.Experiments show that this method can effectively detect label noise.Thirdly,a label noise detection system based on multi-granularity is designed and implemented.The interface of the system is simple and easy to operate,which is convenient for professionals to refer to.
Keywords/Search Tags:label noise, noise detection, multi-granularity, relative density, space division
PDF Full Text Request
Related items