Font Size: a A A

Identify Abnormal Data Objects In Medical Insurance Based On Outlier Detection Method

Posted on:2017-01-22Degree:MasterType:Thesis
Country:ChinaCandidate:H W GuanFull Text:PDF
GTID:2284330485480016Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of computer science and database technology, the outlier detection technology is widely applied to network intrusion detection, fraud detection, case studies, business analysis and so on. In statistics, an outlier is an observation point that is distant from other observations. The generation mechanism of the outlier is usually different from the normal objects, so it may contain more important information. In health insurance, the moral hazard unavoidable lead to all kinds of waste, fraud, and abuse in the medical insurance operating. These irregularities cause substantial economic loss and has seriously influenced the sustainable development of the health insurance system. Based on outlier detection technology, we detect the abnormal objects in the health insurance. Those abnormal data objects can be viewed as the suspected fraud cases. It can help the supervisors to accomplish their inspection work.The health insurance dataset is large scale and complex. The outlier detection is a challenging work. There are three major problems for the outlier detection work in the health insurance dataset. The first one is that the dataset contains too many irrelevant or redundant attributes and data objects. It may slow down the mining process and reduce the accuracy. The second problem is that the medical dataset is high dimensional and sparse. Those features make the conventional outlier detection methods not suitable for the dataset. The third one is that the dataset of the clinical pathway is contains both sparse regions and dense regions. The distributions of the dimensions are various. These make it difficult to outlier detection.In this paper, based on the data characteristics of the health insurance dataset, the main contributions and innovations are the following three points:This paper proposes a novel rough set theory based method, which is used to attribute reduction and relevant set extraction.In the health insurance dataset, there are some superfluous attributes. Those attributes may affect the exact extent of outlier detection. Based on the rough set theory, we reduce the attributes which have low distinguished. We propose the concept of "relevant set", and then we put forward an algorithm for the extraction of the relevant set. The relevant is a subset of the universe set, but it contains basically the same outliers as the universe set. The relevant set narrows the field for outlier detection.This paper proposes an outlier detection algorithm SLOF, to detect abnormal medical data objects.We aim to detect outliers for the same disease in the medical dataset. The dimension of the dataset is high, and the data is sparse. These may result in difficulty of outlier detection. According the features of the dataset, we propose a local density based outlier detection algorithm SLOF. We use vectors to denote the complex high dimensional objects in dataset. We compute the distances between objects. We calculate the density of the neighbors and then we work out the deviation degree of the objects. Based on the ensemble learning, we combine the results computed by different parameters. Finally we obtain an outlier score of each object. If the outlier score of an object exceeds the threshold, then we can define the object as an outlier.This paper proposes an outlier detection algorithm CODM, for detecting abnormal clinical pathways.We detect the abnormal object in the clinical pathways dataset. There are both dense and sparse regions in dataset. The distribution of dimensions is inconsistent. For the characteristic of the data, we propose an outlier detection algorithm CODM, for the abnormal clinical pathway objects detection. The algorithm firstly bag the features, it takes the multi-sampling operation for the features set. And then we compute the distances between objects based on the concept of pathSim. Finally, based on the interval in the neighbors, we detect outliers.
Keywords/Search Tags:healthy insurance, outlier detection, rough set theory, abnormal medical data, clinical pathways
PDF Full Text Request
Related items