Font Size: a A A

Researches On The Estimation And Filtering Methods Of Numerical Label Noise

Posted on:2024-01-31Degree:MasterType:Thesis
Country:ChinaCandidate:P QinFull Text:PDF
GTID:2568307115457544Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,the era of big data has been developing rapidly,related applications continue to penetrate into various industries,and the value of big data is becoming more and more prominent.Due to the massive scale and wide range of data sources,there is generally a large amount of complex label noise in practical applications.Studies have shown that label noise not only increases the complexity of models but also may lead to overfitting and unreliable predictions.Therefore,how to effectively handle label noise is very important.Label noise can be classified as numerical label noise and categorical label noise.The complexity of numeric label noise distribution poses a huge challenge to noise processing and modeling.Existing label noise filtering methods mainly adopt the combination of case selection and filtering algorithm.These methods can reduce the negative impact of label noise,but there are still problems such as overparameter selection and excessive cleaning,so the filtering effect is not good.This paper studies the problem of numerical label noise filtering,with specific contents as follows:(1)For numerical label noise in regression tasks,a limit distance noise estimation and filtering(LDNF)method is proposed.Compared with traditional noise filtering methods,LDNF theory studies the correlation between numerical label noise and label estimation interval,turns the complex numerical label noise estimation problem into the distance between samples and label estimation interval,and verifies its feasibility through simulation experiments.Combined with the optimal sample selection framework,the accuracy of the method for label noise recognition is further improved and the generalization error of the model is reduced.(2)Due to the complexity of numerical label noise,the existing noise filtering methods are often not effective enough when dealing with complex numerical label noise.The multi-granularity method is robust and extensible.Therefore,granular ball concept is introduced to process the discrete regression data set,and a granular ball label noise filtering(GBNF)method is proposed.This method can not only reduce the sample size of data set,remove redundant and simplified data,but also has the ability to filter label noise,so as to improve data quality.(3)A numerical label noise filtering system is designed and developed based on the proposed label noise filtering method.The system can test the performance of the system integrated label noise filtering method.At the same time,after opening the local regression data set and calling the noise filtering method,the filtered data set information can be saved.The system can assist fields such as artificial intelligence in acquiring low-cost and high-quality data information,thereby improving the generalization ability of the model.This research provides new ideas to deal with numerical label noise,enriches and develops the theory and methods of low-quality data modeling,it also provides technical support for obtaining high-quality and low-cost datasets in practical scenarios.
Keywords/Search Tags:Numerical label noise, Noise filtering, Label estimate interval, Granular ball noise filtering, Optimal Sample Selection Framework
PDF Full Text Request
Related items