Font Size: a A A

A Conceptual Query Based Multi-Document Summarization In Biomedical Domain

Posted on:2012-07-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y ShangFull Text:PDF
GTID:2218330368987805Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The number of biomedical literature is growing rapidly these years. The mass amount of literatures offer the biomedical researchers not only various kind of knowledge needed, but also great challenge for searching and retrieving relevant information. In biomedical databases, one biomedical concept may receive a great number of relevant literatures. Some diseases can get more than ten thousand results while some genes may obtain thousands of literatures. It is time consuming to read these papers one by one. As a result, it is significant for biomedical researchers to have a quickly understand for the query concept by integrating the relevant various kind of resources.Automatic text summarization is a process of refining and generating the text content which uses less words or sentences to illustrate the whole document set and this can be contribute to the quick understand of the text. It will make the work of biomedical researchers more efficient to apply automatic text summarization technique into biomedical text mining. In this paper, we apply automatic summarization technique to generate summaries for two kinds of concept which is diseases and genes. Semantic relation extraction and learning to rank is conducted to solve the tasks respectively according to the corpus characteristics.In the task of disease summary generation, we propose the method by using biomedical semantic relations to enhance the performance of disease summary generation in order to cover more semantic information for the query disease in the summary. We firstly extract the semantic relations in biomedical text. Then the relevant relations of the querying concept are extracted and selected. Sentences in summarization are ranked and extracted by semantic relations to generate a summary for the given concept. We evaluate 24 common diseases in the experiment and the generated summary contains causes of the disease, types and treatments of the given diseases. Experimental results show that this method can improve the summarizing performance. And compared with the general method, summarization with semantic relations can integrate the content of multi-document on semantic level which can meet the need of biomedical researchers.In the task of gene summary generation, we regard the automatic summary as a ranking problem and apply a machine learning method learning to rank to automatically solve this problem. Gene relevant sentences are scored by three kinds of features which are gene ontology relevant score, topic relevant score and TextRank score. And we can obtain the feature weights using learning to rank algorithm and predict the scores of candidate summary sentences by the feature weight vector and get the top sentences to generate summary. Experimental results show that the combination of three features can improve the performance of summary. And the application of learning to rank can facilitate the further expansion of features for measure the significance of sentences.
Keywords/Search Tags:Automatic Summarization, Feature Selection, Learning to Rank, Semantic Relation Extraction, Similarity Calculation
PDF Full Text Request
Related items