Font Size: a A A

Biomedical Text Clustering Algorithm Research And Applications

Posted on:2010-09-10Degree:MasterType:Thesis
Country:ChinaCandidate:W YuanFull Text:PDF
GTID:2208360275491461Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Biomedical research are most concerned research area of the twenty-first century and researchers published many papers,has reached an average of more than 100,000.How to mining related literature effectively,researchers in these fields are facing more and more great challenges.As one branch of bioinformatics,biomedical text mining technology is a highly efficient automatic tool that access to newly exploration-related knowledge,and had made substantial progress in recent years. How effective use the biomedical knowledge contained in these texts undoubtedly is very important to massive biomedical data analysis.Commonly used method is searching the words at MEDLIN,or searches the Internet,but it is only the substantial collection of documents founded is a list of relevant documents,rather than a user interested in directly from the text to obtaining useful information.Therefore,an effective tool for knowledge automatically extraction from a large-scale biomedical literature is an Urgent task.The thesis offers an ensemble clustering method and applied it to clustering biomedical text.Furthermore the thesis uses a semi-supervisor clustering method based on metric learning to clustering biomedical literature.Our work and contribution can be summarized as follows:a) Introduce the background of biomedical document and current works in mining biomedical literature.Review research of clustering method and its application in biomedical literature.Moreover,we clarify problems of current clustering method by analyzing their reliability and parameters,and finally put forward a solution:ensemble clustering.b) On the basic of having studied clustering ensembles thoroughly,by focusing on reviewing the relationships between numbers of base clusters in every cluster and the quality of the final result,and an improved algorithm to improve the accuracy of clustering ensemble was made.First,according to the idea that the real diversities among clusters,a formula to measure this diversity was defined;Secondly, whether the difference between numbers of base clusters and has infect on the ensemble result through experiments was inspected. Experimental data show that improved algorithm is superior to the original algorithm on accuracy. c) Use Mesh ontology as the knowledge to improve the clustering. Medical subject headings are used to analyze biomedical journal literature resources by United States National Library of Medicine and are also the United States National Library of Medicine's MEDLINE database's search subject index dictionary,and its hierarchical structure contains a wealth of knowledge of biology.Therefore this paper,offers a clustering algorithm based on the distance between the MESH,and finally a general comparison with the current method described in the biomedical literature clustering and show that our method achieves better clustering results.
Keywords/Search Tags:Data Mining, Clustering, Ensemble Clustering, Biomedical text, Metric learning, Mesh
PDF Full Text Request
Related items