| With the advancement of technology,data collection and processing in various industries have become increasingly effortless.Swiftly sifting and extracting the information latent in the data can not only significantly enhance the technical prowess of intelligent data processing in various industries,but also furnish substantial undergirding for the development of correlated industries.With the increasing amount of data,the distribution of data gradually presents a trend of imbalance.Compared with the samples in the data set,the ones with a small number of samples are the focus of in-depth research.For example,in medical detection,spam filtering,bank card fraud prevention and other aspects,the imbalance of data distribution is more common.Effectively solving this problem can timely discover and predict the possible risks,which has important significance of scientific research and application.Existing classifiers often fail to achieve satisfactory recognition rates when dealing with imbalanced datasets.Traditional classification models usually use balanced data sets for training in order to obtain higher classification accuracy,but the effect of this method is often unsatisfactory when dealing with imbalanced data sets.Therefore,this paper first proposes a hybrid sampling method based on boundary information fusion clustering by analyzing the data distribution pattern of imbalanced datasets.This method first defines the concept of boundary points and preserves the boundary points of majority class samples in the dataset accordingly.Then,under-sampling is performed on the remaining majority class samples and the Borderline-SMOTE method is used to over-sample the under-sampled dataset,resulting in the final dataset.By combining this sampling method with traditional classifiers and conducting experiments on multiple public datasets,the results show that it can effectively improve the classification accuracy of imbalanced data.Then,according to the idea of reward function and cumulative reward in reinforcement learning,an improved decision tree algorithm integrating reinforcement learning mechanism is proposed.In view of the problem that a few class samples are easily misclassified in imbalanced data classification,the selection criteria of node splitting attribute of decision tree algorithm is adjusted to make it pay more attention to minority class samples in the splitting process,so as to improve the probability of minority class samples being correctly classified.Through the comparison experiment with the original decision tree algorithm,it is found that the proposed algorithm has improved the recall rate and G-mean index.Finally,the improved decision tree algorithm is used as the base classifier of Adaboost algorithm and combined with the proposed mixed sampling method to obtain the final algorithm.The proposed algorithm is tested on several public data sets,and the experimental results verify the effectiveness of the proposed algorithm. |