Font Size: a A A

Automated methods of auditing and using terminology/ontology knowledge bases for natural language processing

Posted on:2010-10-19Degree:Ph.DType:Dissertation
University:Columbia UniversityCandidate:Fan, Jung-WeiFull Text:PDF
GTID:1448390002979849Subject:Biology
Abstract/Summary:
Due to our cognitive nature of communicating in natural language, narrative information plays a critical role in storing and disseminating knowledge. In a knowledge-intensive domain such as biomedicine, the overhead to digest huge amount of texts in clinical reports, research literature, and consumer websites, is extremely demanding. Biomedical natural language processing (BioNLP) is an informatics specialty that aims to automatically analyze and restructure biomedical text into more digestible size and format so that it can be easily post-processed by humans or other automated programs. In order to handle the comprehensive lexical and semantic knowledge in biomedicine, BioNLP systems need to incorporate domain-specific terminology/ontology knowledge bases. In addition, using standardized lexical/semantic entities will benefit the interoperability between BioNLP systems and associated applications. However, two major issues have been observed as hindering the optimal use of terminology/ontology for BioNLP: First, the existing terminology/ontology knowledge bases are not customized for NLP purposes and contain problematic contents; Second, automated solutions for improving and using the knowledge bases are still inadequate and therefore limiting their use in BioNLP.;To address the issues, corresponding solutions were proposed in the dissertation both to improve terminology/ontology for BioNLP purposes and to demonstrate feasibility of using terminology/ontology in BioNLP applications. For the first task, two automatic classifiers were developed to reclassify and audit semantic classification of terminology concepts. The classifiers use empirical language features and complement other auditing methods that apply ontological principles. For the second task, we developed unsupervised methods that use terminology/ontology for word sense disambiguation (WSD). The methods can help reduce the labor of manual annotation and sample representative evaluation instances for WSD research. Promising results have been achieved in both tasks and we have made the reclassified concepts a public database for the community. The results also enhanced our understanding about the biomedical terminology/ontology knowledge bases and pointed out interesting directions for future research. The methods by the dissertation can be generalized to other fields and should promote the use of standardized terminology/ontology in biomedicine and healthcare.
Keywords/Search Tags:Terminology/ontology, Natural language, Methods, Using, Automated
Related items