Font Size: a A A

Research On The Application Of Data Mining Technology In Detecting And Identifying The Medical Insurance Fraud

Posted on:2019-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y F DouFull Text:PDF
GTID:2404330566480048Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of China's reform on its medical and health system,the coverage of medical insurance is expanding as well as the number of medical insurance participants,at the same time,the fraud irregularities are also increasing with constant renovation of forms and means,which not only has a great impact on the safe and stable development of the medical insurance fund,but also does enormous harm to the society.Because the medical insurance data has the characteristics of large data,such as massive,diversity and fast generation,the disadvantages of the traditional method of detecting and identifying fraud are of heavy workload,low efficiency and inclination to errors.How to use these data to detect these frauds automatically and efficiently has practical significance.This paper uses data mining technology to deal with massive medical insurance record data,in order to realize the detection and identification of medical insurance fraud,mainly in the following aspects:(1)In view of the fact that the single prescription drug charge is too high and the health insurance fraud is not imitated and learn-able.A method which is Euclidean distance based on index weight is proposed in this paper to describe the similarity among data.Considering that most of the existing related work uses supervised learning analysis method,like statistical ideas and neural networks,and this method usually requires manual for tagging fraud data.But in fact the medical insurance records has the relevant characteristics of big data which is of the large amount of data and lack of prior knowledge,this paper first analyzes the processed data by principal component analysis.Calculating the principal component rotation matrix and the cumulative variancecontribution rate,and then getting the weight value of each original index.Defining a weight factor and getting the weight of each original indicator,then executing the clustering operation and getting the isolated points.The experimental results show that the weights of the algorithm proposed are closer to those weights of the algorithms in the literature,and the suspected fraud records are also closer by detecting,at the same time,the algorithm proposed in this paper greatly reduces the execution time of the algorithm and improves the efficiency of data processing.(2)In view of the data of the identified medical insurance fraud record,transforming them into a two-classification problem for classification research.In this paper,using the idea of parameter optimization to optimize the penalty factor c in Support Vector Machine Algorithm and the classification parameter g in radial basis function,and then proposing an improved GASAAPSO_SVM algorithm.Genetic algorithm(GA)and particle swarm optimization(PSO)are used to determine the optimal solution of each group in the process of executing the algorithm,and at the same time,the harmonic inertia factor is introduced to improve the PSO,and using Metropolis criterion of simulated annealing algorithm to optimize the late local search ability of particle swarm algorithm,so as to seek the global optimal solution as the parameter input of SVM and train the data set to obtain the SVM classification model.(3)In order to make the evaluation results more objective,using the evaluation index with more information to measure the algorithm effect in this paper,including the accuracy of classification,recall rate,sensitivity,the difference of accuracy,AUC and so on.In the experiment,the algorithm was firstly applied to the breast cancer data set of Wisconsin Medical College,USA,and the experimental results show that the accuracy of the algorithm,recall rate,the sensitivity,the F-measure and so on indications are all higher by comparing all the algorithm in the training data set of 60%,70% and 80%,and the classification accuracy has also been improved to some extent.Finally,the experiment is carried out on the data set of medical insurance fraud record,which mainly analyzes the overall classification effect of the method proposed in this paper.In addition,discussing the differences of the parameters which is obtained from different algorithms.The experimental results show that the classification accuracy of the proposed algorithm in this paper is higher in the classification and recognition ofmedical insurance fraud records,and better results have been achieved in the other evaluation indicators.it shows that this method is feasible and effective in the classification and identification of medical insurance fraud records.
Keywords/Search Tags:Medical insurance fraud, Principal Component Analysis, Support vector machine, Parameter optimization, Evaluation index
PDF Full Text Request
Related items