Font Size: a A A

Research And Application Of Medical Insurance Fraud Detection Based On Hadoop Platform

Posted on:2018-05-25Degree:MasterType:Thesis
Country:ChinaCandidate:H J ChenFull Text:PDF
GTID:2359330512489115Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the further improvement of medical and economic level, China's medical insurance coverage has become very wide and people enjoy the real benefits of health insurance policy. In contrast, the abuse of health care funds has become more and more serious. More and more funds are cheated and it is the time to fighting against with it. At present, the health insurance agency mainly uses the rules engine to audit the settlement information. But the rules rely on a few indicators, they are not perfect and can 't be updated in time, cause the rules may be cheated easily by carefully falsified data.This paper analyzes the characteristics of medical insurance data and establish a set of fraud detection process using data mining technology. The main contents are as follows.1. The feature engineering. Due to historical reasons, there are many flaws in the existing data sets. The raw data are processed by several steps: remove the noise data, the recovery of missing data, feature selection and so on.2. The fraud detection based on DBSCAN algorithm. According to the characteristics of extremely unbalanced data, we analyze the effect of unsupervised algorithms in fraud detection. The paper compares the result of various clustering algorithms on the data set, and uses the DBSCAN algorithm to identify the abnormal cluster.3. The accurate detection using density-based sampling method and Random Forest algorithm. On the basis of the clustering result, a density-based sampling method is proposed to re-balance the data. And then the sampling information is used in the model selection of Random Forest. The combination of classification and clustering algorithm makes the accuracy improve, and finally form a complete fraud detection framework.4. The parallel implementation based on Hadoop platform. For the scene of big data,we propose the parallel algorithms of Random Forest and DBSCAN and then they are implemented on Hadoop platform use Map-Reduce.This paper applies data mining technology to the field of medical insurance anomaly detection and reflects some of its innovations. Firstly, no longer limited to modeling specific fraud situations, making it possible to identify the data which is uncommon and thus the algorithm has a stronger generalization ability. Secondly, a density - based sampling method is proposed to combine the DBSCAN algorithm with the RandomForest algorithm, and the over - fitting is effectively controlled while ensuring the high accuracy. Thirdly, we put forward a data partition method in the implement of parallel algorithm which embodies the idea of load balancing.
Keywords/Search Tags:clustering, classification, abnormal detection, Hadoop
PDF Full Text Request
Related items