Font Size: a A A

Processing And Identification Methods Of Imbalanced Financial Transaction Data

Posted on:2022-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:X P FengFull Text:PDF
GTID:2518306740495234Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the contemporary society with rapid development of information technology,the highly developed financial transaction environment makes the users' financial transaction cost decrease,transaction means increase,and transaction frequency increase,so that financial transactions have the characteristics of real-time and universality,and the financial transaction data also fully possesses the characteristics of big data such as large quantity,fast growth rate,variety,and high value,and becomes the big data of financial transactions,but at the same time,it also brings new problems,namely The problem of financial transaction security.It is an important and difficult topic to analyze and detect non-normal transactions from the huge amount of financial transactions.On the one hand,the data volume of financial transaction data is extremely large,and many traditional methods have become increasingly unsuitable for the processing and analysis of massive high-frequency data,so it is necessary to resort to big data technology.On the other hand,these transaction data show a high degree of imbalance,i.e.,the proportion of abnormal transactions in all transaction records is small,which also brings obstacles to data analysis and classification detection,and there is no perfect classification system for processing and detection of imbalanced financial transaction data.To address the above problems,this paper firstly researches and summarizes the theories related to financial transaction big data and unbalanced data,analyzes the impact of data imbalance and the necessity of dealing with the imbalance of unbalanced financial transaction data,comprehensively researches the analysis methods of financial transaction data and the processing methods of unbalanced data,and then proposes two kinds of methods for processing and classifying unbalanced financial data based on two perspectives of oversampling methods and feature selection.The KS-GA method framework is developed by examining the combination of these methods,and finally,a machine learning classification model is used to model the financial transaction data to effectively identify abnormal transactions.The method framework first uses a modified SMOTE method,KM-SMOTE,for minority sample synthesis,which clusters the samples of the whole dataset,and then finds safe regions in all clustering spaces by certain strategies,and only performs minority sample synthesis in the safe regions.This method can effectively avoid the problems such as noise and fuzzy classification boundaries that may arise from methods such as SMOTE in principle,while not expanding the influence of isolated points.Secondly,this paper proposes a feature selection method FSGA based on genetic algorithm design,which abstracts each possible feature selection result into individuals in the genetic algorithm,generates populations through genetic operations,and measures the merits of individuals by fitness,and finds the individual with the best fitness,i.e.,the best feature subset,through continuous global search for merit.The results of the validation experiments based on the UCI dataset demonstrate the usefulness of these two methods.Finally based on the above methods,this paper explores the correct way to combine these two methods through experiments and relevant theoretical analysis,and combines the use of machine learning classification models to finally form the KS-GA method framework in this paper.In the final practical scenario application,the experiments use real credit card transaction data,and the combination of the two classification models and experiments with different parameters demonstrate the usefulness of the method in dealing with imbalanced financial transaction data.
Keywords/Search Tags:financial security, imbalanced data, oversampling, genetic algorithm, machine learning
PDF Full Text Request
Related items