Font Size: a A A

Research On Medical Data Imputation And Classification Methods Based On Similarity Measurement

Posted on:2020-11-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y YinFull Text:PDF
GTID:2404330575996896Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of information science and technology,the scale of medical data develops at an "explosive" speed.Massive data provide a solid basis for the generation of health big data.Medical data analysis based on machine learning for disease prediction and diagnosis has become a research hotspot.Medical data have there own characteristics: there are association attributes and missing data in medical data sets.These characteristics bring new challenges to the analysis methods based on medical data.Aiming at the actual needs of medical data research,this paper designs a new similarity measurement method which considers attribute correlation.Based on this measurement method,a data imputation method and data classification methods are designed to deal with the medical data missing,classify data,and finally realize the assistant diagnosis of diseases.The main contents are as follows:(1)A new similarity measurement is proposed.This method first calculates the correlation coefficients among attributes in the data set,then transforms them into weights that can be directly used for calculation by using the kernel function,and finally designs a new similarity measurement method which fully considers the attribute association in the data set.(2)A new data imputation algorithm is proposed which based on the new similarity measurement method to choose the similar samples,and the missing values are calculated by weighted linear regression method.In this paper,a variety of classification algorithms such as K nearest neighbor classification method and least squares classification method are selected as controls.Control experiments are performed on medical data sets such as Alzheimer's disease dataset and arrhythmia dataset.The results show that this method leads the other algorithms by 4.4%~12.2% in the accuracy of imputation,and keeps the lowest root mean square error.(3)Two improved data classification methods are proposed.Both methods are based on the new similarity measurement method,one is the improvement of K-nearest neighbor classification algorithm,and the other is the improvement of support vector machine classification algorithm.The same control experiments are carried out on the medical dataset.The results show that the proposed methods are superior in classification accuracy to the traditional classification methods such as support vector machine and C4.5 by 2.2%~9.9%.
Keywords/Search Tags:medical data, similarity measurement, data imputation, data classification, assistant diagnosis
PDF Full Text Request
Related items