Font Size: a A A

A Study Of Gene Name Normalization And Functional Prediction Based On Semantic Resource

Posted on:2012-06-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y C HuFull Text:PDF
GTID:2218330368988079Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
It takes too much time and effort for people to read the large amount of literature in biomedical databases. If computers can help people to recognize the biological entities, extract the interaction between biological entities such as proteins and genes, it will provide great help for the technical research and information management in the field of biomedicine. Therefore, our research focuses on two issues:(1) as the lack of standardization in mentioning named entities, there are a lot of homonymies and synonyms in biomedical papers. We study how to distinguish between the different senses for a gene name and give a uniform format. (2) The Gene Ontology functional annotation is helpful in the explanation of life science phenomenon. We investigate how to predict the gene functions with the help of text mining methods and the existing resource.Firstly we present a normalization method integrating biomedical resource for disambiguation in this thesis. The traditional gene name normalization methods have the problem that the description of gene symbols in biomedical databases is not rich and complete so that it is hard to make a choice from different gene symbols for an ambiguous gene name. In this method, extended semantic information is extracted for each gene symbol from Gene Ontology data and MEDLINE abstracts, and the unique identifier which expresses the actual meaning of the named entity is determined depending on the similarity of the context information and extended semantic description. The results show that the incorporation of these external resources makes the similarity measure more informative and yields better performance.We take the second research issue of gene functional prediction as a hierarchical multi-label classification problem. The hierarchical relationship of Gene Ontology resource is made use of to adjust the training samples and relieve the imbalance of the positive and negative training samples. Meanwhile the discriminating ability of classifiers is enhanced by preserving the easily confused training samples. Additionally, the top-down classifier based on a tree structure takes the relationship of target classes into consideration and thus can solve the incompatibility of the classification results and the Gene Ontology structure. The experimental results demonstrate that when the training set is small, it can be increased by topologically propagation of the father and son nodes. The top-down classification model can apply to the text set which is in an ontology structure or in a hierarchical structure. In the research of gene name normalization and gene functional prediction, we found that the rich semantic knowledge and the hidden structural information in biomedical resource can provide great support to the development of text mining technology in biomedicine.
Keywords/Search Tags:Gene Normalization, Disambiguation, Functional Prediction, Hierarchical Classification
PDF Full Text Request
Related items