Research On The Classification Ensemble Algorithm For Medical Insurance Anomaly Detection

Posted on:2017-03-24

Degree:Master

Type:Thesis

Country:China

Candidate:X L Li

Full Text:PDF

GTID:2308330485488157

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the rapid development of medical insurance system in China, Medicare fraud is not poor. Since the various forms of fraud and covert operations, coupled with the lack of anti-fraud experience, the medicare anti-fraud work is facing great challenges at this stage. On the other hand, our hospital information system has accumulated a large number of medical records of patients, however, in which the information has not been fully utilized. Therefore, by the data mining and anomaly detection, medicare combined mining patient medical records of potential value provides the study of medicare anomaly detection a new idea.This thesis studies how to integrate classification of data mining algorithms and then apply to the field of health care anomaly detection to improve the ability to detect abnormal samples. Since samples are unbalanced in medicare records, we must first balance the dataset,and then classify the relative balance of the sample before the dataset processing. The main research work of the thesis is as follows:Firstly, based on non-equilibrium medicare data, we propose a new method of hybrid sampling, which combines subsampling based on the K-means clustering algorithm and smote sampling method.Secondly, based on selective ensemble theory, we improved the random forest model. First sort the base-classifier according to the classification effect of F-measure. Filtrate some base-classifiers which performs lower based on top-percent. Then based on the inconsistency we remove those base-classifiers whose value of F-measure are low in high similarity-based classifiers. In this way, we ensure the group to be integrated classifier of accuracy and differency.Thirdly, we use two programs in medicare anomaly detection experiment. One of it, equilibrate the insurance unbalanced date using the hybrid sampling method. Then classify them by the improved random forests based on selective ensemble. The other of it, use the improved random forests based on selective ensemble to classify the unbalanced dataset directly, but using smote sampling at each iteration of the process of random forests to equilibrate the medicare samples. After experiments and analysis on ensemble algorithms, it is found that the twoimprovement improve random forest well on anormaly detection in health care effect, random forests using smote sampling has better performance. When the improved algorithms improve the detection ability, they also increase the running time of the training model at the same time, so in the next step, I will make efforts to reduce the training time of the improved random Forest.

Keywords/Search Tags:

medical insurance system, data mining, classification integration, unbalanced data

PDF Full Text Request

Related items

1	The Rearch And Application Of Data Mining Techniques On Medical Insurance
2	Research Of Data Analysis And Data Mining Technology Based On Medical Insurance Consuming History Records
3	The Application And Research Of Data Mining In Yili Region's Medical Insurance Monitoring
4	Design And Application Of Medical Insurance Management System Based On Data Mining
5	Research On SVM Classification Of Unbalanced Data And Its Application In Identify Poor Students In Colleges And Universities
6	Application Of Data Mining In Medical Insurance Claims Analysis
7	The Research Of Data Mining And OLAP About Medical Insurance In Social Protection System
8	The Application Of Data Mining Technology In The Medical Treatment Insurance System
9	Research On Application Of Data Mining Technology In The Insurance Industry
10	Research On Risk Prevention And Control Of Guiyang Medical Insurance Fund Based On Data Mining Technology