Biomedical Text Clustering Algorithm Research And Applications

Posted on:2010-09-10

Degree:Master

Type:Thesis

Country:China

Candidate:W Yuan

Full Text:PDF

GTID:2208360275491461

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Biomedical research are most concerned research area of the twenty-first century and researchers published many papers,has reached an average of more than 100,000.How to mining related literature effectively,researchers in these fields are facing more and more great challenges.As one branch of bioinformatics,biomedical text mining technology is a highly efficient automatic tool that access to newly exploration-related knowledge,and had made substantial progress in recent years. How effective use the biomedical knowledge contained in these texts undoubtedly is very important to massive biomedical data analysis.Commonly used method is searching the words at MEDLIN,or searches the Internet,but it is only the substantial collection of documents founded is a list of relevant documents,rather than a user interested in directly from the text to obtaining useful information.Therefore,an effective tool for knowledge automatically extraction from a large-scale biomedical literature is an Urgent task.The thesis offers an ensemble clustering method and applied it to clustering biomedical text.Furthermore the thesis uses a semi-supervisor clustering method based on metric learning to clustering biomedical literature.Our work and contribution can be summarized as follows:a) Introduce the background of biomedical document and current works in mining biomedical literature.Review research of clustering method and its application in biomedical literature.Moreover,we clarify problems of current clustering method by analyzing their reliability and parameters,and finally put forward a solution:ensemble clustering.b) On the basic of having studied clustering ensembles thoroughly,by focusing on reviewing the relationships between numbers of base clusters in every cluster and the quality of the final result,and an improved algorithm to improve the accuracy of clustering ensemble was made.First,according to the idea that the real diversities among clusters,a formula to measure this diversity was defined;Secondly, whether the difference between numbers of base clusters and has infect on the ensemble result through experiments was inspected. Experimental data show that improved algorithm is superior to the original algorithm on accuracy. c) Use Mesh ontology as the knowledge to improve the clustering. Medical subject headings are used to analyze biomedical journal literature resources by United States National Library of Medicine and are also the United States National Library of Medicine's MEDLINE database's search subject index dictionary,and its hierarchical structure contains a wealth of knowledge of biology.Therefore this paper,offers a clustering algorithm based on the distance between the MESH,and finally a general comparison with the current method described in the biomedical literature clustering and show that our method achieves better clustering results.

Keywords/Search Tags:

Data Mining, Clustering, Ensemble Clustering, Biomedical text, Metric learning, Mesh

PDF Full Text Request

Related items

1	Research Of Clustering Algorithm Based On Web Text Mining
2	Research On Key Technologies Of Clustering Ensemble
3	Research On Clustering Algorithms Based On Metric Learning For Complex Data
4	Semi Supervised Clustering Algorithm And Its Application And Research
5	Research On Co-association Matrix Based Clustering Ensemble Algorithm
6	Research On Ensemble-Initialized K-Means Clustering Algorithms
7	Research On Clustering Ensemble And Semi-Supervised Clustering In Data Mining
8	Research On Clustering Algorithms In Traffic Domain
9	Research On Fuzzy Clustering And Clustering Ensemble In Data Mining
10	Research On The Effectiveness Element Theory And Method Of Clustering Ensemble