Font Size: a A A

Research On Label Noise Filtering Learning Algorithm Based On Multi-granularity

Posted on:2021-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:B G WangFull Text:PDF
GTID:2428330614458385Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In supervised learning,the label quality of the training data is critical to the learning effect.However,labels in real data are often mislabeled.These incorrectly labeled samples are called label noise.Label noise usually has a negative impact on the training of the classification model,such as increasing the training time,reducing the classification ability of the classification model,and increasing the complexity of the model.The relative density method is a very effective and universal method for filtering label noise.Because of its O(N2)time complexity,it is less efficient on large data sets.Granular computing is a scalable,efficient,and robust method that uses simple,low-cost approximate solutions instead of precise solutions.This thesis uses the characteristics of granular computing to improve the traditional relative density method.A fast relative density method based on space partitioning is proposed.This method first partitioning the sample space into a series of sub-partition,and calculates the relative density of the samples within each partition.This can reduce the time overhead caused by traditional methods to calculate relative density globally.Experiments prove that the proposed method has much higher efficiency than the traditional relative density,and can maintain the denoising ability of the relative density.In addition,the method is further improved,and a multi-layer space partition relative density algorithm is proposed.The relative density is calculated multiple times under a plurality of different partition results,that is,the label noise is collectively detected from multiple granularities.In fact,due to the dynamic nature of spatial division,the relative density can be calculated multiple times during one complete partitioning,so the improved method is still efficient.Experiments prove that this method is not only efficient,but also has higher accuracy than the traditional relative density.When faced with data with label noise,the existing sampling methods often get unsatisfactory results.The multi-granularity method itself has scalability and robustness.Therefore,this thesis introduces the concept of granular ball,and proposes a general sampling method that is not limited to any particular data set,specific classifier,or specific scene,and is called granular ball sampling.Granular ball sampling can not only reduces the number of samples in the data set and compresses the data,but also has the ability to filter the label noise in the data,thereby improving data quality.
Keywords/Search Tags:label noise filtering, granular computing, relative density, space partitioning, granular ball sampling
PDF Full Text Request
Related items