Font Size: a A A

Application Of Data Mining Technology Inmedicalcostdata

Posted on:2016-04-21Degree:MasterType:Thesis
Country:ChinaCandidate:K ZhangFull Text:PDF
GTID:2298330467492900Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years, with the rapid development of computer technology in the field of health care, medical information systems, disease diagnosis systems, medical imaging systems and information systems were widely used not only in large general hospitals, but also in hospitals of all levels. Since the NCMS (New Rural Cooperative Medical Care System) was gradually implemented in2003and gradually achieve the basic coverage of rural residents until2010, in recent years, our government is increasing. the investment in NCMS. Over time, it generated a lot of health data. However, the configuration of medical resources still exist unfairness and the level of health care also has a large gap between urban and rural area. To solve this problem, the National Twelfth Five Technology Support Program "basic rural health key technology research and demonstration" was started in hospitals below county level in the country. A major task of the program is mining the data of NCMS and providing support for policy.This paper research and compare the applicability of different algorithms are used of the data by this paper, and then test the results by prediction accuracy, and use the visualization platform to appear. The raw data of medical cost which was collected have different characteristics from other data, such as authenticity, privacy, diversity, incompleteness, redundancy, etc. For this reason, this research did a lot of repeat processing work like data reduction, data transformation, exception error handling. After a unified data format was formed, this paper did a preliminary statistical work in5levels hospitals of8disease system’s data including mean, variance, and quartiles. Then according to the statistics results, this paper analyzed the shortage of statistics and the benefits for data mining, pointed out the data is showing skewed normal distribution, at the same time pointed out the unscientific and irrationality of other researchers used quartile classification or only relied on their personal experience. So this paper base on the use of K-means clustering algorithm processing technology, K-means clustering algorithm based on unsupervised self-learning, it is used to explore and characterize data set, K-means is not easy affected by the data distribution’s characteristics, simultaneously, K-means is taking into account the characteristics of the data itself. Then this paper used C4.5decision tree algorithm to analyze the influence factors of medical cost data which had been processed by use K-means algorithm. According to the prediction accuracy, K-means algorithm has big advantages than the method based on quartiles in the data of blood disease. After, this paper describes the visualization platform of medical data, the differences between this platform and normal statistical visualization tools are the platform has the functions of a dynamic display, interactive and simple calculation. The paper also gave the explanations of the possible reasons on forming the results of influence factors.
Keywords/Search Tags:Data Mining, K-means Clustering, C4.5Decision Tree, Medical Cost, Visualization
PDF Full Text Request
Related items