Font Size: a A A

Multi-dimensional fragment classification in biomedical text

Posted on:2007-04-10Degree:M.ScType:Thesis
University:Queen's University (Canada)Candidate:Pan, FengxiaFull Text:PDF
GTID:2448390005476450Subject:Computer Science
Abstract/Summary:
Automated text categorization is the task of automatically assigning input text to a set of categories. With the increasing availability of large collections of scientific literature, text categorization plays a critical role in managing information and knowledge, and biomedical text categorization is becoming an important area of research. The work presented here is motivated by the possibility of using automated text categorization to identify and characterize information-bearing text within biomedical literature. Under a recently suggested classification scheme [ShWR06], we examine the feasibility of using machine learning methods to automatically classify biomedical sentence fragments into a set of categories, which were defined to characterize and accommodate certain types of information needs. The categories are grouped into five dimensions: Focus, Polarity, Certainty, Evidence, and Trend. We conduct experiments using a set of manually annotated sentences that were sampled from different sections of biomedical journal articles. A classification model based on Maximum Entropy, designed specifically for this purpose, as well as two other popular algorithms in the area of text categorization, Naive Bayes and Support Vector Machine (SVM), are trained and evaluated on the manually annotated dataset. The preliminary results show that machine learning methods can classify biomedical text along certain dimensions with good accuracy.
Keywords/Search Tags:Text, Biomedical, Classification
Related items