Research On Label Noise Filtering Learning Algorithm Based On Multi-granularity

Posted on:2021-02-01

Degree:Master

Type:Thesis

Country:China

Candidate:B G Wang

Full Text:PDF

GTID:2428330614458385

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In supervised learning,the label quality of the training data is critical to the learning effect.However,labels in real data are often mislabeled.These incorrectly labeled samples are called label noise.Label noise usually has a negative impact on the training of the classification model,such as increasing the training time,reducing the classification ability of the classification model,and increasing the complexity of the model.The relative density method is a very effective and universal method for filtering label noise.Because of its O(N2)time complexity,it is less efficient on large data sets.Granular computing is a scalable,efficient,and robust method that uses simple,low-cost approximate solutions instead of precise solutions.This thesis uses the characteristics of granular computing to improve the traditional relative density method.A fast relative density method based on space partitioning is proposed.This method first partitioning the sample space into a series of sub-partition,and calculates the relative density of the samples within each partition.This can reduce the time overhead caused by traditional methods to calculate relative density globally.Experiments prove that the proposed method has much higher efficiency than the traditional relative density,and can maintain the denoising ability of the relative density.In addition,the method is further improved,and a multi-layer space partition relative density algorithm is proposed.The relative density is calculated multiple times under a plurality of different partition results,that is,the label noise is collectively detected from multiple granularities.In fact,due to the dynamic nature of spatial division,the relative density can be calculated multiple times during one complete partitioning,so the improved method is still efficient.Experiments prove that this method is not only efficient,but also has higher accuracy than the traditional relative density.When faced with data with label noise,the existing sampling methods often get unsatisfactory results.The multi-granularity method itself has scalability and robustness.Therefore,this thesis introduces the concept of granular ball,and proposes a general sampling method that is not limited to any particular data set,specific classifier,or specific scene,and is called granular ball sampling.Granular ball sampling can not only reduces the number of samples in the data set and compresses the data,but also has the ability to filter the label noise in the data,thereby improving data quality.

Keywords/Search Tags:

label noise filtering, granular computing, relative density, space partitioning, granular ball sampling

PDF Full Text Request

Related items

1	Tolerance Granular Space And Its Applications
2	Granular Space And Granular Computing Of Information Systems Based On Binary Relation
3	The Research And Application Of Privacy Preserving Based On Granular Computing
4	Research On Partitioned Clustering Algorithm Based On Granular Computing And Density Peak
5	The Application Of Granular Computing In Clustering Analysis
6	The Research Of Intelligent Search Engine Technology Based On Granular Computing
7	Research Of Granular Computing And Extension Of Variable Precision Rough Set Theory Based On Pansystems Theory
8	Optimization And Application Of Granular Space Model
9	Studies On Granular Computing-based Of Text Classification Technology
10	An Image Filtering Algorithm Based On The Granular Computing Theory Of Quotient Space