Font Size: a A A

Research On Reduction Solution Methods Of Data Noise Suppression Based On Guided Learning

Posted on:2024-05-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y LuFull Text:PDF
GTID:2568307154495874Subject:Master of Electronic Information (Professional Degree)
Abstract/Summary:PDF Full Text Request
With the rapid development of network technology,the scale,dimensions,and types of data are also growing.Tens of millions or even billions of information are collected from different applications in the real world,resulting in large-scale data sets.Meanwhile,the collected information mixes with many redundant and irrelevant information.Therefore,how to effectively handle the redundant and irrelevant information has become an open issue.Attribute reduction is one of the important ways to achieve feature selection,and has witnessed the development of rough sets.Different from common feature selection techniques,attribute reduction can not only effectively delete irrelevant or redundant attributes in data,but also be equipped with clear semantic interpretation.Although attribute reduction can effectively achieve dimensionality reduction of data,its results of attribute reduction and efficiency of calculating reduct are not satisfactory when faced with large-scale and biased data.To improve the shortcomings of traditional reduct strategies,this article deeply analyzes the shortcoming from different perspectives and proposes strategies to improve them.Firstly,large-scale data or high-dimensional data is analyzed from a local perspective and introduce guiding ideology;Secondly,the deviation data is deeply studied through multiple acquisitions and mean value perspective;Finally,according to the complex scenes of data,a multi-scene fast attribute reduction strategy that can simultaneously face the noise and large-scale data is constructed,to suppress data noise and improve the classification ability of subsequent classification tasks and reduce the time consumption of deriving reduct.Specifically,the research content and innovation achievements of this thesis mainly cover the following points:1.A label-specific guidance for efficiently searching reduct is proposed.Although researchers have explored many strategies to accelerate the calculation of reduct,most of the strategies are considered from the perspective of samples and attributes and do not make full use of the label information in the data.Therefore,we divide the whole data into several parts by the label information in the data.Meanwhile,we can build a guiding ideology to select appropriate conditional attributes because of the hierarchical structure between labels.For each part of the data,we can use the previous related reduct to guide the calculation of the subsequent related reduct.Finally,the reduct of the whole data can be naturally obtained through this thinking of guides learning.In this strategy,the process of calculating reduct can be accelerated.2.A data noise suppression-based on attribute reduction is proposed.During the transmission and collection of data,noise may be mixed with the data.Therefore,From the perspective of data acquisition,a large amount of original data is obtained by multiple acquisition strategies,and the changes of data are studied and analyzed.By comparing the different processing methods such as box splitting,clustering,regression,and mean value,the mean value method is finally selected to process the data.First,the attribute values of the acquired data are stacked,and then the arithmetic mean value of the stacked results can be taken,or the weighted average value can be used.These strategies can make the data reach a balance point,thus reducing the data deviation.3.A data noise suppression guidance for searching reduct is proposed.To further enhance the effectiveness of the above two strategies,the two strategies can be further integrated.Firstly,data is acquired through multiple acquisition strategies and the attribute values of the data are superimposed.Then,the superimposed values are averaged according to the arithmetic mean value to suppress the noise in the data.Secondly,the data that has been suppressed from noise is divided based on label information,sorted by the calculated values of measure under each label.Based on the hierarchical structure of the label,guidance thinking is introduced to strategy,the reduct of the previous part of the data can guide the reduct of the next part of the data.This strategy can effectively suppress noise in the data and reduce the time consumption of reduct.
Keywords/Search Tags:Attribute Reduction, Data noise suppression, Label-specific, Guiding learning, Rough Set
PDF Full Text Request
Related items