Research On Medical Data Imputation And Classification Methods Based On Similarity Measurement

Posted on:2020-11-27

Degree:Master

Type:Thesis

Country:China

Candidate:Y Yin

Full Text:PDF

GTID:2404330575996896

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the development of information science and technology,the scale of medical data develops at an "explosive" speed.Massive data provide a solid basis for the generation of health big data.Medical data analysis based on machine learning for disease prediction and diagnosis has become a research hotspot.Medical data have there own characteristics: there are association attributes and missing data in medical data sets.These characteristics bring new challenges to the analysis methods based on medical data.Aiming at the actual needs of medical data research,this paper designs a new similarity measurement method which considers attribute correlation.Based on this measurement method,a data imputation method and data classification methods are designed to deal with the medical data missing,classify data,and finally realize the assistant diagnosis of diseases.The main contents are as follows:(1)A new similarity measurement is proposed.This method first calculates the correlation coefficients among attributes in the data set,then transforms them into weights that can be directly used for calculation by using the kernel function,and finally designs a new similarity measurement method which fully considers the attribute association in the data set.(2)A new data imputation algorithm is proposed which based on the new similarity measurement method to choose the similar samples,and the missing values are calculated by weighted linear regression method.In this paper,a variety of classification algorithms such as K nearest neighbor classification method and least squares classification method are selected as controls.Control experiments are performed on medical data sets such as Alzheimer's disease dataset and arrhythmia dataset.The results show that this method leads the other algorithms by 4.4%~12.2% in the accuracy of imputation,and keeps the lowest root mean square error.(3)Two improved data classification methods are proposed.Both methods are based on the new similarity measurement method,one is the improvement of K-nearest neighbor classification algorithm,and the other is the improvement of support vector machine classification algorithm.The same control experiments are carried out on the medical dataset.The results show that the proposed methods are superior in classification accuracy to the traditional classification methods such as support vector machine and C4.5 by 2.2%~9.9%.

Keywords/Search Tags:

medical data, similarity measurement, data imputation, data classification, assistant diagnosis

PDF Full Text Request

Related items

1	Research On Application Of Missing Data Imputation In Medical Field
2	Evaluations And Applications On Several Imputation Approaches Of Integrated Omics Data
3	Research And Development Of Feature Optimization Algorithm For Heterogeneous Health Big Data Diagnosis And Treatment Model
4	Research And Implementation Of Medical Data Life Cycle For Big Data
5	Research And Application On Key Techniques In ICU-oriented Medical Data Mining
6	Research And Application Of Medical Heterogeneous Data Integration Based On Ontology
7	Medical Data Set Filling And Classification Based On Machine Learning
8	Research On Identification Of Pan-cancer Common Driver Modules Based On Imputation Data
9	Self-Learning And Dual-View Network For Spinal Metastatic Assistant Diagnosis
10	Construction And Analysis Application Of Medical Case Report Literature Library Based On Big Data Technology