Font Size: a A A

Concept And Passage Retrieval For Knowledge Discovery From Biomedical Literature

Posted on:2009-07-03Degree:MasterType:Thesis
Country:ChinaCandidate:R ChenFull Text:PDF
GTID:2178360272470397Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Extracting the implicit biological relationship from biomedical literature contributes to the biomedical hypothesis which can be explored through the experiments further. However, with the growth of the number of biomedical literature, it is impossible to read all the literature artificially. Don R. Swanson, the professor emeritus in the department of information science at the University of Chicago, presented non-interrelated literature-based knowledge discovery method which can explore the hidden connection from large biomedical literature and form the hypothesis to lead the science experiment. Swanson's knowledge discovery method comes to be a very hot research topic.The calculating process of concept retrieval is simple and intuitionistic. In the paper, co-occurrence analysis technology is used to extract potential knowledge from the biomedical literature database—MEDLINE. The experiment focuses on discovering new biological connections between disease and chemicals, drugs or genes. Three computational methods are compared in the paper for scoring and ranking the MeSH terms: z-score, TFIDF (Term Frequency Inverse Document Frequency) and PMI (Pointwise Mutual Information). We report on three sets of experiments: Alzheimer's disease, Migraine disorders, Schizophrenia and use the information retrieval metrics which is introduced by LitLinker system to evaluate the performances. According to the characteristics of the three methods, a fusion formula is introduced for re-ranking and re-choosing the terms to improve the final outcome. We found that z-score and TFIDF show better performance in different experiments and our empirical results validate the effectiveness of fusion approach. The discovered results contain more tendency topic which meets the knowledge discovery requirement.Passage retrieval is introduced for the further experiment because the result data of concept retrieval knowledge discovery are dispersive and it has negative impact on the performance of the final ranking. In the next experiment, passage retrieval is used to extract the MeSH concept in the complete sentences from the abstract of MEDLINE. To compare the results with the results of concept retrieval experiment, the same computational method and starting topic are used in the experiment, and evaluate the precision and recall. The results show that the MeSH concepts which are discovered by the passage retrieval experiment are concentrated and have a higher precision than that in the concept experiment. However, the concentricity of MeSH concept leads to the loss of recall and weaken the development tendency.The experiment demonstrates three classical implicit connections which are found by Swanson: Alzheimer's disease and indomethacin; Migraine and Magnesium; Schizophrenia and Calcium-independent phospholipase A2 in the open discovery. Moreover, other connections with the three start topics are discovered by our experiments. These potential connections can be used to assist experts to disclose implicit relations in the literature and introduce them to achieve knowledge discovery.
Keywords/Search Tags:Knowledge Discovery, MeSH, Co-occurrence Analysis, Passage Retrieval
PDF Full Text Request
Related items