Font Size: a A A

Research On The Classification Ensemble Algorithm For Medical Insurance Anomaly Detection

Posted on:2017-03-24Degree:MasterType:Thesis
Country:ChinaCandidate:X L LiFull Text:PDF
GTID:2308330485488157Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of medical insurance system in China, Medicare fraud is not poor. Since the various forms of fraud and covert operations, coupled with the lack of anti-fraud experience, the medicare anti-fraud work is facing great challenges at this stage. On the other hand, our hospital information system has accumulated a large number of medical records of patients, however, in which the information has not been fully utilized. Therefore, by the data mining and anomaly detection, medicare combined mining patient medical records of potential value provides the study of medicare anomaly detection a new idea.This thesis studies how to integrate classification of data mining algorithms and then apply to the field of health care anomaly detection to improve the ability to detect abnormal samples. Since samples are unbalanced in medicare records, we must first balance the dataset,and then classify the relative balance of the sample before the dataset processing. The main research work of the thesis is as follows:Firstly, based on non-equilibrium medicare data, we propose a new method of hybrid sampling, which combines subsampling based on the K-means clustering algorithm and smote sampling method.Secondly, based on selective ensemble theory, we improved the random forest model. First sort the base-classifier according to the classification effect of F-measure. Filtrate some base-classifiers which performs lower based on top-percent. Then based on the inconsistency we remove those base-classifiers whose value of F-measure are low in high similarity-based classifiers. In this way, we ensure the group to be integrated classifier of accuracy and differency.Thirdly, we use two programs in medicare anomaly detection experiment. One of it, equilibrate the insurance unbalanced date using the hybrid sampling method. Then classify them by the improved random forests based on selective ensemble. The other of it, use the improved random forests based on selective ensemble to classify the unbalanced dataset directly, but using smote sampling at each iteration of the process of random forests to equilibrate the medicare samples. After experiments and analysis on ensemble algorithms, it is found that the twoimprovement improve random forest well on anormaly detection in health care effect, random forests using smote sampling has better performance. When the improved algorithms improve the detection ability, they also increase the running time of the training model at the same time, so in the next step, I will make efforts to reduce the training time of the improved random Forest.
Keywords/Search Tags:medical insurance system, data mining, classification integration, unbalanced data
PDF Full Text Request
Related items