Font Size: a A A

Full Biomedical Texts Clustering And Relationship Network Analysis

Posted on:2015-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y ChenFull Text:PDF
GTID:2298330467953638Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Due to the fast development of biology and medical science, automaticallyclustering the rapid growing number of biomedical articles is becoming a necessity to makeeffective use of this large amount of scientific information. However, most of current studiesfocus only on the abstracts of the scientific contributions. In this paper, we move the study tobiomedical full texts. To solve the curse of dimensionality, we firstly use two vectors spannedsub-space with Cosine Coefficient instead of computing all vector’s space model withEuclidean distance. Then, we introduce Semi-supervised Affinity Propagation (SSAP) tofurther enhance the efficiency. To examine the performance on real biomedical articles andsolve the problem of lacking ground truth measurement for the data, we make the uniqueISSN number for each journal as ground truth. Experimental results show that SSAP avoidsthe high-dimensional sparse matrix computation, outperforms traditional k-means andimproves standard Affinity Propagation algorithms. Moreover, on the basis of SSAPclustering results, we constructed a directed relationship network and cluster distributionmatrix for the biomedical corpus. From the network and matrix, the publishing scope overlapsand interests of these BioMed journals could be easily caught out.
Keywords/Search Tags:Biomedical text clustering, k-means, Affinity Propagation, directed relationshipnetwork, cluster distribution matrix
PDF Full Text Request
Related items