Font Size: a A A

Data Preparation For Risk Control Of Medical Insurance Fund

Posted on:2011-11-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y M XuFull Text:PDF
GTID:2178360305997321Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Risk Prevention Platform of Medical Insurance Fund (RPPMIF) is to recognize and evaluate the risks in the management of the medical insurance and apply the appropriate methods to prevent and deal with the risk. In fact, it is a decision support system, which is a main application of KDD (Knowledge Discovery and Data Mining). Data preparation is a critical part of KDD process. It provides high-quality data for Data Mining algorithms. Considering the complex date type of data sources, chaos in metadata management, low maintainability of ETL process and data missing in RPPMIF, this paper has a study on the related data preparation techniques.First, the types and contents of the sources and targets are analyzed, and current data quality problems are found. After that this paper designs the ETL and metadata management strategy. Then it makes a profound research on miss data imputation techniques and brings forward a new imputation algorithm. Also it uses regex expressions and pattern matching methods to crawl, clean and parse the data in semi-structured data sources.Second, this paper improves the ETL method, builds the metadata management model of ETL process and metadata repository, designs and implements an ETL tool and a metadata management tool. The tools are put into practice in RPPMIF. Experiments show that compared to the original methods, the ETL tool is more automatic, needs less manual work, easy to maintain and quite effective. The metadata management tool collects the metadata and allows the users or developers to issue queries on metadata and analyze the data lineage. They provides data quality guarantee for building the model library of RPPMIF.
Keywords/Search Tags:Data Preparation, Missing Data Imputation, ETL, Metadata Management, Data Cleaning, Data Quality
PDF Full Text Request
Related items