Research On Clustering Algorithm Of Disease Risk Factors Based On Big Data Technology

Posted on:2020-04-29

Degree:Master

Type:Thesis

Country:China

Candidate:L Yao

Full Text:PDF

GTID:2514306512487614

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Recently,big data analysis technology has been applied widely.Clustering method divides samples into clusters by calculating the similarity.It can help us find hidden relationships between samples.In the medical field,clustering can mine potential information and provide decision support for medical researchers.In this thesis,the extraction algorithm of disease risk factors based on clustering technology is studied.The specific work is as follows:The K-means clustering method based on improved Canopy to extract risk factors is constructed in this thesis.First of all,feature selection is used to filter features,and the improved Canopy algorithm is to obtain the cluster number and the initial center points.Then the internal relations of the feature variables are mined by K-means,the risk factors are extracted by calculating correlation index.The algorithm has achieved good performance in predicting the number of clusters and other evaluation indicators of clustering performance.The clustering method for fixed weights can not describe the geometry of data.In this thesis,the idea of dynamically adjusting weights is applied to K-means algorithm.This algorithm constructs initial weights by SVM-RFE,the initial center points are more proximate to the distribution of data.And clustering can converge more quickly and reduce the number of iterations.The algorithm selects the risk factors according to the weights of variables.The experiments show that the algorithm achieves better performance on clustering time efficiency,and verifies the effectiveness of selecting key features based on feature weights.The medical data is intricate and hard to analyze.This thesis combines Gaussian mixture clustering and hard clustering method which integrated the strenghs of them to apply to medical data.And also improved the initial parameters of EM algorithm.We proposed“weighted hierarchical coefficient” to calculate the importance of each feature node by decision trees.We studied the internal tendency of the subset of dataset by boosting.

Keywords/Search Tags:

Clustering, Risk Fators, K-Means, CHD

PDF Full Text Request

Related items

1	Detection Of Arterial Input Function From Cerebral Perfusion Using DSC-MRI Based On Clustering Analysis
2	Relative Technology And Realization Of DNA Microarray Lmages Recognition
3	The Research On Application Of K-means Clustering Algorithm In SIR Infectious Disease Model
4	Research Of Clustering Strategies For Dynamic Electrocardiogram Waveform
5	Research On Weighted Clustering Algorithm Based On Tumor Gene Expression Data
6	Research On Brain MR Image Segmentation Algorithm Based On Fuzzy C-means Clustering
7	Construction And Implementation Of A Cloud Nursing Health Service Platform Bases On K-means Aigorithm
8	Study On The Diagnosis Method Of Ventricular Premature Beat Based On Fuzzy C-means
9	Study On The Risk Prediction Model Of Elderly Patients Before Operation Based On Cluster Learning
10	The Research Of Cerenkov Luminescence Tomography Algorithm Based On Unsupervised Clustering