Font Size: a A A

Design And Implementation Of Clustering Analysis System For Medical Data

Posted on:2019-10-21Degree:MasterType:Thesis
Country:ChinaCandidate:S WangFull Text:PDF
GTID:2404330578983402Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of medical informatization,a large number of medical information products have emerged in an endless stream.These information products,such as electronic medical record systems,medical record management systems and resident management systems,record and store a large number of medical data such as patient medical records and diagnostic information every day,generating huge amounts of data every year Because of the different suppliers of information products,different standards of database design,non-standard information entry requirements and so on,a large number of "dirty data" are produced.This part of data seriously affects the overall quality and has a huge negative impact on data information association and mining analysis.To improve data quality and mining efficiency,this paper designs and implements medical data clustering analysis system whose functions including database design,data cleaning,clustering analysis algorithm implementation and testing are elaborated.The main work includes the following points:1.Based on the data of China's current information platform,it uses the open source ETL tool to clean and convert medical data.Combined with the status quo of data,some key points and difficulties in the data extraction process are analyzed.Then the data samples for analysis are gotten.2.Based on the Oracle database,it establishes an intermediate database,and designs a database dictionary table,a code table,a source database table,and a disease,indicator associated dimension table,to store dictionary data,source data,and cleaned dimensional data for different companies.3.Clustering analysis algorithms are adopted to further parse the data.It measures the relationship between disease and certain attribute characteristics of patients,and preliminarily digs out the potential relations between indicators and diseases.The threshold values between these attributes and the induced diseases are calculated.Thereby it provides a reasonable prediction and auxiliary warning for human health management.4.Using two different clustering methods in the sample data gets clustering results.Through the relevant calculation methods of the clustering effect evaluation,it analyzes the accuracy of clustering results and the reference values of the clustering results.
Keywords/Search Tags:Data mining, Clustering analysis algorithm, Medical data
PDF Full Text Request
Related items