Font Size: a A A

Research And Application Of Clustering Analysis Algorithm Based On Mixed Attribute Data

Posted on:2019-07-25Degree:MasterType:Thesis
Country:ChinaCandidate:M M XuFull Text:PDF
GTID:2428330545460160Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
Data mining technology can extract valuable information from a large number of irregular data sets.It is the result of the natural evolution of information technology and can meet people's need for searching useful information.Clustering analysis is a widely used tool in data mining,which can discover interesting knowledge from potential data sets without prior information.The goal of clustering analysis is to divide data sets into several classes or clusters by clustering,so that the similarity of objects in the same cluster is as large as possible,while in different clusters is as small as possible.In the driving of practical application requirements,the researchers have proposed a variety of clustering algorithms and have achieved a great deal of results in biomedicine,customer relationship management,image processing,pattern recognition and other fields.However,the data in the real world are mostly mixed data composed of numerical attributes and classification attributes,and most of the algorithms that can deal with this kind of data have poor performance and clustering quality.So the research of mixed attribute data clustering algorithm has become a hot issue in the field of clustering analysis.From the perspective of improving the accuracy and the efficiency of the clustering algorithm,the existing clustering algorithms of mixed attribute data are studied,which solves the problem of the selection of the initial clustering centers and the dissimilarity of the classification attributes in the k-prototypes algorithm.Therefore,an improved k-prototypes algorithm based on average difference degree is proposed.First,the initial clustering centers is selected by using the average difference degree,which avoids the uncertainty of the initial clustering centers selected by the k-prototypes algorithm.Second,the mixed attribute data metric formula in the k-prototypes algorithm ignores the importance of numerical attribute data and cannot use the information of clustering sets effectively.Especially,the differences between data objects and clusters cannot be fully reflected with the increase of data and the complexity of the attribute type.Aiming at resolving these problems,the attribute weights of numerical data are determined with the information entropy,improving the efficiency of the algorithm.In addition,the classification attribute metric formula is improved so that the data objects can be more scientifically divided into the clustering sets belonging to them,and then the mixed attribute data metric formula is given.In order to verify the effectiveness of the improved algorithm,simulation experiments are performed on real data sets and compared with different clustering algorithms.The experimental results show that the improved algorithm has a higher accuracy and stability.Finally,the improved clustering algorithm is applied to the analysis of medical data sets.The dermatological data sets are classified and identified,which can judge the patient's disease type.The clustering of the diagnostic data sets of patients with heart disease is performed to analyze the patient's various indicators and to predict whether the patient is at risk of heart disease.The results show that the algorithm has a good application prospect in the medical data analysis.
Keywords/Search Tags:Data mining, Clustering, K-prototypes algorithm, Mixed attribute data, Average difference degree
PDF Full Text Request
Related items