Font Size: a A A

Study On Affinity Propagation Clustering Algorithm And Its Application In Mining Electronic Medical Records

Posted on:2018-03-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:L L SunFull Text:PDF
GTID:1314330512467554Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
In the era of big data,how to extract knowledge from data is a very hot topic.Exemplar-based clustering is an effective unsupervised learning method,which can obtain knowledge from unlabeled data,and has been successfully used in many practical applications like customer seg?mentation,community detection,discovery of opinion leader,detection of abnormal purchase behavior,etc.In fact,the study of exemplar-based clustering has passed more than 50 years since the first proposition of K-means.In other fields,for example,facility location problem in operations research,this problem has also been widely studied.However,exemplar-based clustering has been approved to be a NP-hard problem,which means it is a very difficult to find a good solution of this problem.Affinity Propagation(AP)clustering is a recently proposed exemplar-based clustering al?gorithm.Compared with previous exemplar-based clustering algorithms,AP can achieve better clustering performance on most of the studied data sets.Additionally,we do not need to provide an initial exemplar set when we use AP,even the number of clusters is not required in the imple?mentation of AP.With the advantages mentioned above,AP has arisen many attentions,and has been widely used in the analysis of text data,image data,and gene expression data.However,the data science has experienced a rapid development in the recent two or three years.The collected data sets have become more and more complex in both type,structure,and volume.In this case,the standard AP clustering algorithm suffers many challenges:1)the standard AP clustering is mainly designed for analyzing static data,but dynamic data has taken a more important position in data science;2)the standard AP can only find spherical clusters,but the distribution of data objects has become more and more various;3)the standard AP has been rarely used in large-scale clustering problem because of its high compLutational complexity,but the volume of data increases exponentially in nearly every filed.For the three problems mentioned above,this paper provides solutions separately and pro?poses:1)two incremental AP clustering algorithms,which extend the application of AP in dynamic data environment;2)Nonspherical AP clustering,which can discover clusters with arbitrary shapes;3)Fast AP clustering,which can deal with large-scale clustering problem.Ad?ditionally.this paper uses the proposed algorithms to mine electronic medical records.The main contribution of this paper can be summarized as follows:1.Incremental AP clustering.This paper first points out why standard AP cannot be used in incremental clustering problems,then provides two strategies to extend the standard AP in dy-namic data environment.Based on this,two incremental AP clustering algorithms are proposed:Incremental AP Clustering based on K-Medoids(IAPKM)and Incremental AP Clustering based on Nearest-neighbor Assignment(IAPNA).This paper not only analyzes the theoretical ratio-nality of IAPKM and IAPNA,but also tests the two algorithms by computational experiments on real-world data sets.Experimental results also validate the effectiveness of the two proposed incremental AP clustering algorithms.2.Arbitrary-shaped AP clustering.This paper first analyzes the existing arbitrary-shaped clustering algorithms,then points out the differences of feature similarity with category similar-ity.A Laplacian eigenmaps based method is proposed to transfer feature similarity into category similarity in this paper.Based on the constructed category similarity matrix,arbitrary-shaped AP clustering is proposed.Several synthetic data sets,and two practical machine learning tasks,face recognition and image segmentation are used to test the performance of the proposed method.Experimental results demonstrate that the proposed arbitrary-shaped AP clustering algorithm can find clusters with complex shapes.3.Fast AP clustering.This paper proposes a two-stage enhanced fast AP clustering algo-rithm.In our work,the similarity matrix is first compressed by selecting a number of poten-tial exemplars,then sparsed by removing similarities between distant objects.Factor graph is constructed according to simplified similarity matrix,and message passing is implemented on incomplete similarity matrix.The proposed fast AP clustering can largely improve the compu-tational efficiency of AP algorithm with only a tiny decrease in clustering performance.4.Typical treatment regimen discovery and recommendation.The three clustering algo-rithms proposed above are used to mine electronic medical records.We first use IAPKM and IAPNA to divide patients into cohorts according to their demographic information and diagnostic information,then the arbitrary-shaped AP clustering algorithm is used to analyze the col1mbined network of drug usage,last the fast AP clustering is used to extract typical treatment regimen from treatment records.By matching patient cohorts,the clustering result of treatment records,and the treatment outcome,we find the most effective treatment regimen for a specified patient cohort.This thesis has important theoretical significance and practical values:1)The extension of standard AP makes it can be used in many new data,which also provides three new tools for data science:2)Most of the strategies we used to improve AP are about classic clustering problems,hence our work will also inspire the improvements of other clustering algorithms;3)The typical treatment regimens extracted from large-volume electronic medical records can make doctor better understand the designs of treatment regimens,the regimen recommendation can support the doctors' clinical decisions.
Keywords/Search Tags:Clustering Algorithm, Affinity Propagation, Exemplar-based Clustering, Elec-tronic Medical Records, Data Mining
PDF Full Text Request
Related items