Font Size: a A A

Investigation of term classification with applications to sortal anaphora resolution in the biology domain

Posted on:2007-01-14Degree:Ph.DType:Dissertation
University:University of DelawareCandidate:Torii, ManabuFull Text:PDF
GTID:1455390005981246Subject:Computer Science
Abstract/Summary:PDF Full Text Request
Recent development of high throughput experiment methods in biology fields has resulted in proliferation of data and new findings. Much of this information is available only through the scientific literature. In order to utilize the biological knowledge stored in constantly growing archives, text processing and automated information extraction have gained popularity among language processing and information extraction research communities as well as in biological research community.; In this dissertation, we focus on two tasks that play important roles in various text mining applications. First, we investigated the assignment of semantic types to biological terms. We developed automated methods to extract both term internal and contextual clues for term type identification over the GENIA corpus. We exploit these features in a machine learning setting to achieve state-of-art term classification results.; In the second part of the dissertation, we investigated into the development of sortal anaphora resolution in the Biological domain. Sortal anaphora are phrases (e.g., "the protein", "this kinase", and "these cell lines") that contain sortal or type information, and are the most prevalent styles of anaphora in biological texts. In order to develop a robust resolution system, we used machine learning approach, specifically maximum entropy modeling. Since we were not aware of any annotated corpus suitable for training models in the Biology domain, we developed a corpus containing a set of Medline abstracts, annotated with anaphora-antecedent relations. We developed different resolution models for different types of sortal anaphora as well as a method to determine anaphoricty of definite phrases. For anaphora resolution, we introduced a new set of features that allows us to reduce the error rate by a further 10%.
Keywords/Search Tags:Resolution, Sortal anaphora, Biology, Term
PDF Full Text Request
Related items