Font Size: a A A

Research On Imbalanced Data Classification In Financial Field

Posted on:2022-10-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2518306548499844Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In today's Internet era,with the rapid development of Internet technology and economy,great changes have taken place in the management mode of the financial field.Internet finance has become an important part of the financial industry.Under the trend of increasingly fierce market competition,massive data generated in the financial field has evolved into a special asset,and its commercial and economic value has been widely valued by enterprises and industries.How to effectively use these massive financial data through data mining technology is an important issue in the financial field.Financial data mining can not only improve the service level of financial enterprises and industries,improve their market competitiveness,but also reduce management risk,promote financial innovation,so as to make data play the maximum value.However,one of the main problems of financial data mining is the imbalance of data categories.Imbalanced data is very common in various industries,especially in the financial industry.For example,the number of financial fraud transactions is much lower than that of normal transactions.Therefore,imbalanced data classification has become a research hotspot of financial data mining.In recent years,one class classification algorithm has become one of the main methods to solve the problem of imbalanced data classification.One-class classification algorithm only needs one class of data as training samples,which can effectively reduce the computational cost of classification.However,the existing one-class classification algorithms still have some shortcomings,such as,can not effectively deal with the overlapping problem of class distribution,poor anti noise ability,parameter sensitivity and so on.In view of the above shortcomings,this thesis will study the classification of imbalanced data from the perspective of one-class classification,and apply it in the financial field.Firstly,a two-stage classification algorithm TSC-OSK based on OCSVM(One-Class Support Vector Machine)and KNN(K-Nearest Neighbor)is proposed to classify imbalanced data.Secondly,on the basis of TSC-OSK,an enhanced two-stage classification algorithm ETSC-BOSK based on Birch clustering,OC-SVM and KNN is proposed.Compared with TSC-OSK algorithm,ETSC-BOSK performs better on imbalanced data.Thirdly,it conducts experimental verification on multiple imbalanced datasets from the financial field.The TSC-OSK and ETSC-BOSK algorithms proposed in this thesis are applied to these datasets to verify the effectiveness of this algorithm in the classification of imbalanced data in the financial field.The main research work of this thesis is as follows(1)Two stage imbalanced data classification algorithm based on OC-SVM and KNNThe traditional OC-SVM algorithm has shortcomings in dealing with the boundary and outlier samples in imbalanced data,which can not effectively deal with the overlapping problem of class distribution and poor anti noise ability.To solve these problems,this thesis proposed a two-stage imbalanced data classification algorithm TSC-OSK based on OC-SVM and KNN.Firstly,TSC-OSK algorithm constructs two OC-SVM classifiers by fitting the majority class samples and the minority class samples in the training set.Secondly,two OC-SVM classifiers are used to classify the test samples in the first stage,and the classification results are combined to verify each other.According to the classification results in the first stage,all samples are divided into four types: majority class,minority class,boundary and outlier.Thirdly,KNN algorithm is introduced to classify boundary and outlier samples in the second stage,so as to avoid the prediction bias of OC-SVM algorithm on these samples.Experiments on several imbalanced datasets show that TSC-OSK has good performance in imbalanced data classification in various fields.(2)Enhanced two-stage imbalanced data classification algorithm based on Birch clustering,OC-SVM and KNNBased on the TSC-OSK algorithm proposed in(1),an enhanced two-stage imbalanced data classification algorithm ETSC-BOSK based on birch clustering,OCSVM and KNN was proposed.ETSC-BOSK first divides the majority and minority sample subsets in the training set into multiple clusters by birch clustering,and constructs a OC-SVM classifier on each cluster,so as to obtain several majority cluster detectors and several minority cluster detectors.Secondly,the maximum fusion volume method is used to fuse the decision boundaries of different majority class cluster detectors and minority class cluster detectors respectively,so as to obtain a majority class enhanced detector and a minority class enhanced detector.Thirdly,two enhanced detectors are used to classify the test samples in the first stage,and the classification results are combined to verify each other.According to the classification results in the first stage,all samples are divided into four types: majority class,minority class,boundary and outlier.Fourth,KNN algorithm is introduced to classify boundary and outlier samples in the second stage.Experiments on several imbalanced datasets show that ETSC-BOSK outperforms TSC-OSK and other representative classification algorithms.(3)Application of two imbalanced data classification algorithms in financial fieldAiming at the practical problems of financial fraud detection,financial precision marketing analysis and credit default prediction in the financial field,we conduct indepth research and specific analysis on relevant application scenarios and datasets,and select several representative datasets from the above practical problems for experimental verification.We apply the TSC-OSK and ETSC-BOSK algorithms proposed in(1)and(2)to these datasets respectively.The experimental results show that this method can get better results in financial fraud detection,financial precision marketing analysis and credit default prediction,thus effectively solving the problem of imbalanced data classification in the financial field.
Keywords/Search Tags:financial data mining, imbalanced data classification, OC-SVM, KNN, boundary samples and outlier samples, BIRCH
PDF Full Text Request
Related items