Font Size: a A A

The Research Of Named Entity Normalization In Biomedical Literature Based On Extended Semantic Profiling Disambiguation

Posted on:2010-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:N XiaFull Text:PDF
GTID:2178360302460415Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As the quantity of biomedical literature increases sharply, tremendous kinds of biomedical information become the bottleneck of biomedical researchers work. The major problem with this phenomenon is that the biomedical researchers hardly retrieve the valuable information which is contained in the information sea opportunely, and they could not keep their knowledge up to date. Meanwhile, there are abundant online and offline resources in biomedicine domain, the problem is how we can fully utilize these resources to facilitate the research and represent the existing knowledge to learn new knowledge. With all of the efforts one can renovate the resources for the researchers' further study. The construction of the knowledge source usually spends a lot of time and money, besides it limited by the resource's constructor's ken. Considering all these problems, the domain researchers need a method to handle the inconsistency between the ever increasing amount of literatures and the lagged dilatory velocity of researcher's renovated knowledge. The occurrence of biomedical named entity normalization meets the tide of research.The biomedical named entity normalization is the critical and fundamental constituent of the biomedical text mining research. It adopts the result of biomedical named entity recognition system and assigns the recognized biomedical entities to database identifiers correctly, besides it facilitate the following study, such as entity interaction extraction and implicit knowledge discovery. As known to all, genes and proteins are the most important biomedical entity playing a crucial part for the biomedical research. Therefore, the biomedical named entity normalization research focuses on the gene mention normalization. The goal of gene mention normalization is to recognize the genes and proteins that are mentioned in biomedical literature and map these gene mentions to the database identifiers. This method can reduce the cost of the resource construction. Hence, it has applicable value.In this paper, we first introduce related researches of gene mention normalization in biomedical domain. Secondly, we focus our research scope of this problem on the retrieving and representing the knowledge to facilitate the disambiguation. We take the method based on relevance feedback for gene mention normalization as our first attempt. We form our method through a deep research on related works. Our extended semantic profiling disambiguation method for gene mention normalization is composed of four steps. The first step focuses on preprocessing the original documents and recognizing the gene mention by existing named entity recognition system. At the same time, we combine the dictionary provided by the organizer and the synonym information from database resources to generate our dictionary. We eliminate the errors which are caused by the variants of the synonyms through normalizing the morphological divergences. The second step of our method tickles the mapping between gene mentions and database identifiers. The third step we use information retrieval based extended semantic profiling information for disambiguation, then we take these information as features for machine learning perform. The fourth step we employ Wikipedia based post filter for ruling out the false positives. We evaluate our system's performance on the BioCreative I and II datasets, the experiments achieves a comparable result. We discuss our method base on the experiments. We also depict the prospective work and analyze the feasibility of improvement of this work.
Keywords/Search Tags:Gene Mention Normalization, Disambiguation, Extended Semantic Profile, Machine Learning
PDF Full Text Request
Related items