Resampling Learning For Data Error And Data Privacy-Preserving | Posted on:2023-04-22 | Degree:Doctor | Type:Dissertation | Country:China | Candidate:Q Y Wang | Full Text:PDF | GTID:1528307169977569 | Subject:Management Science and Engineering | Abstract/Summary: | PDF Full Text Request | Information management and information system is an important scientific field with information as the core resource and information technology as the tool to focus on solv-ing problems of the economic management of social environment,and has three important dimensions of“information,technology and management”.Facing the rapid development of information technology,the data scale has exploded,and complex interactive rela-tionships has exploded between data.These complex relationships contain rich values.However,people’s understanding of data is incomplete and inconsistent,which makes the relationship between the collected data patterns abnormal and brings a large number of data errors.These data errors seriously affect model learning process,which leads to model overfitting.Resampling learning can effectively solve that these data errors cause model overfitting,so as to improve the performance of the model.This paper focuses on the research of algorithms in data error classification combined with resampling learning,including resampling learning algorithm design at the data level,model improvement at the algorithm level,interpretability analysis of experimental results.This paper also dis-cusses and studies the design of resampling learning strategy and optimization method of data sequence rematching alignment for data noise and class-imbalanced under data privacy protection.The main work includes:(1)For the resampling learning of data error,in order to solve the problem of data noise,this paper proposes a self-paced resampling learning based on random forest,that is,in the random forest learning process,designs a sampling strategy from high-quality sam-ples to low-quality samples.In order to solve the problem of class-imbalanced,this paper proposes an adaptive class-imbalanced resampling learning strategy to select high-quality samples of majority class to improve the model performance,that is,through multiple it-erative training select the appropriate subset of high-quality samples and minority class of labeled samples for data recombination.These resampling learning strategies can ef-fectively improve the classification performance of the model in many experiments.This paper also uses gene detection in medical diagnosis as the interpretability of the model to explain the proposed model.(2)In order to solve the problem of class-imbalanced and the robustness of resam-pling learning method not enough to adapt to data heterogeneity,this paper proposes a multiview feature class-imbalanced meta-self-paced resampling method,recalled M~2SPL,which can separate adjacent redundant features and generate multiview feature subsets,then multiview learning can help each other in different views to improve the performance.In addition,M~2SPL can set the initial parameters of self-paced resampling learning au-tomatically,and effectively select high-quality samples and discard high-noisy samples to improve the performance and robustness of the training model.Several experimental results show that M~2SPL is better than comparative imbalanced learning algorithms and can efficiently deal with imbalanced classification problems.(3)In order to solve the problems of distribution of samples in different clients and data noise,federated resampling learning with privacy-preserving method is proposed in this paper,named Fed SPL.This method uses a federated learning model to train a shared global model and connects the scattered training data in the form of privacy protection to improve global performance.Fed SPL not only reduces the risk of data privacy leakage but also effectively selects high-confidence samples and removes high noise samples to improve the performance of the training model.Several experimental results show that the proposed model can significantly improve the performance of the model.Meanwhile,this paper also analyzes the interpretability of the model results.(4)In order to solve the problems of the distribution of features in different clients and class-imbalanced,a vertical federated class-imbalanced self-paced resampling learn-ing method is proposed in this paper.The learning parameters of the proposed method are transmitted and connected under privacy protection for data features distribution in different clients,aiming to improve the performance of the global model under privacy protection.Resampling learning is used to find a more appropriate data sequence space for the original imbalanced dataset.Each client selects high-quality samples by resam-pling learning and then aligns the sample ID of encryption to evenly distributed samples of global learning model.In addition,the global model only uses a part of samples of each client,which not only improves the efficiency of the model but also reduces the risk of data sample exposure.Finally,several experimental results demonstrate that the proposed method can effectively improve the performance of the model. | Keywords/Search Tags: | Resampling learning, Data noise, Data class-imbalance, Data security, Privacy-preserving, Self-paced learning, Federated learning | PDF Full Text Request | Related items |
| |
|