| The classification of imbalanced data has always been one of the hot topics in machine learning and pattern recognition,and has been widely applied in credit assessment,medical diagnosis and intrusion detection.For the imbalanced classification,the minority class data often provide the more valuable information,such as the abnormal transactions of credit card,unusual use behavior of the internet.Therefore,the classification results of the minority class are given more attentions in the study of imbalanced data.However,due to imbalanced number of samples in different classes accompanied by class overlapping,higher-dimensionality,small disjuncts,the classification of the minority class is more difficult and the classification results are also worse.This thesis carries out research of two-class imbalanced data to solve the existing issues in imbalanced classification,and the main research works are as follows:(1)In order to deal with the problem of valuable information loss in the under-sampling method,a mixed sampling method based on safe sample screening is proposed.This method combines the under-sampling method based on safe sample screening and over-sampling method to conduct safe sample screening for the imbalanced data.This method retains important samples that are valuable for determining classification boundary,and then carries out over-sampling to make the data set basically balanced.In the final,the experiment results show this method is effective.(2)Aiming at the characteristics of the higher-dimensional and imbalanced data,a mixed sampling method based on double screening of safe features and safe samples is proposed.The method uses synergy effect between safe feature screening and safe sample screening,and conducts double screening of safe features and safe samples for high-dimensional imbalanced data.This method discards samples and features that are not valuable for determining the classification boundary,and reduces the dimension of the data set.On this basis,an over-sampling method is performed to make the data set balanced.The experiment proves the effectiveness of this method.(3)The mixed sampling method is used to deal with the imbalanced data in intrusion detection.Intrusion detection classifies intrusion behaviors in network behavior by classifying user network behavior data,and intrusion behavior data has obvious characteristics of imbalanced data.Therefore,a re-sampling method proposed in this thesis is applied to solve the intrusion detection problem,sample and process the intrusion detection data.The feasibility of the method in practical application is further verified through the experiments.After preprocessing imbalanced data,more sample in the minority can be correctly classified,and the classification performance of imbalanced data is improved.This research will have important theoretical and practical significance for the research of imbalanced classification problems. |