Font Size: a A A

Development and Evaluation of a Predicate-Based Biomedical Search Engine Using Design Science Methodology

Posted on:2012-08-02Degree:Ph.DType:Dissertation
University:The Claremont Graduate UniversityCandidate:Kwak, MyungjaeFull Text:PDF
GTID:1468390011462377Subject:Information Technology
Abstract/Summary:
With the rapid, almost exponential increase of online biomedical information, researchers require more time and effort to locate desired information. Search engines that support finding precise, complementary, and contrasting information will greatly help researchers pinpoint appropriate information, save time, and review more information. This research adopts a design science approach to develop and evaluate a predicate-based biomedical search engine. The search method uses triples as the base for search engines, which combine medical entities, e.g., cigarette smoking and lung cancer , with the predicates, e.g., causes, between them. This is in contrast to a list of noun phrases, which is used by existing search engines. Triples are considered to be better than noun phrases in that they include not only medical entities but also relation information between them.;This study adheres to the design science approach and follows the process proposed by Takeda, Veerkamp, and Yoshikawa (1990) and the guidelines by Hevner, March, Park, and Ram (2004). The search engine's critical components are iteratively developed, evaluated, and improved: (1) a predicate parser that extracts predicates from biomedical text, and (2) a predicate-based search engine that indexes the predicates and locates documents using matching algorithms. The first component, the predicate parser, was evaluated in a controlled user study and achieved 91% precision and recall, which makes it comparable to other similar researches that accomplished the best results. The second component, the search engine, was evaluated using controlled user studies with representative researchers in biomedicine. The results showed that the hybrid approach, which combines triple-based search and keyword-based search, achieved a statistically significant performance improvement compared to a baseline approach, which is the keyword-based approach. The average evaluation score of the new approach was increased by 31.63% and precision was improved by 12.94%.;This research revealed the following two implications. First, the study of the predicate parser reveals that the combination of the two methods (Finite State Automata and Support Vector Machines) can improve both precision and recall for predicate extraction. Second, the study of the search algorithm shows that more sophisticated data structures (the predicates) can facilitate better search results.
Keywords/Search Tags:Search, Predicate, Biomedical, Design science, Information, Using
Related items