A Conceptual Query Based Multi-Document Summarization In Biomedical Domain

Posted on:2012-07-15

Degree:Master

Type:Thesis

Country:China

Candidate:Y Shang

Full Text:PDF

GTID:2218330368987805

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

The number of biomedical literature is growing rapidly these years. The mass amount of literatures offer the biomedical researchers not only various kind of knowledge needed, but also great challenge for searching and retrieving relevant information. In biomedical databases, one biomedical concept may receive a great number of relevant literatures. Some diseases can get more than ten thousand results while some genes may obtain thousands of literatures. It is time consuming to read these papers one by one. As a result, it is significant for biomedical researchers to have a quickly understand for the query concept by integrating the relevant various kind of resources.Automatic text summarization is a process of refining and generating the text content which uses less words or sentences to illustrate the whole document set and this can be contribute to the quick understand of the text. It will make the work of biomedical researchers more efficient to apply automatic text summarization technique into biomedical text mining. In this paper, we apply automatic summarization technique to generate summaries for two kinds of concept which is diseases and genes. Semantic relation extraction and learning to rank is conducted to solve the tasks respectively according to the corpus characteristics.In the task of disease summary generation, we propose the method by using biomedical semantic relations to enhance the performance of disease summary generation in order to cover more semantic information for the query disease in the summary. We firstly extract the semantic relations in biomedical text. Then the relevant relations of the querying concept are extracted and selected. Sentences in summarization are ranked and extracted by semantic relations to generate a summary for the given concept. We evaluate 24 common diseases in the experiment and the generated summary contains causes of the disease, types and treatments of the given diseases. Experimental results show that this method can improve the summarizing performance. And compared with the general method, summarization with semantic relations can integrate the content of multi-document on semantic level which can meet the need of biomedical researchers.In the task of gene summary generation, we regard the automatic summary as a ranking problem and apply a machine learning method learning to rank to automatically solve this problem. Gene relevant sentences are scored by three kinds of features which are gene ontology relevant score, topic relevant score and TextRank score. And we can obtain the feature weights using learning to rank algorithm and predict the scores of candidate summary sentences by the feature weight vector and get the top sentences to generate summary. Experimental results show that the combination of three features can improve the performance of summary. And the application of learning to rank can facilitate the further expansion of features for measure the significance of sentences.

Keywords/Search Tags:

Automatic Summarization, Feature Selection, Learning to Rank, Semantic Relation Extraction, Similarity Calculation

PDF Full Text Request

Related items

1	Microblogging Automatic Summarization Research
2	Semantic Based Similarity Analysis Of Human Video
3	Research On Feature-based Semantic Relation Extraction Between Entities
4	The Approach For Event-based Multi-document Automatic Summarization
5	Research On Automatic Summarization And The Application In Proposal Management
6	Research On Short Text Automatic Summarization Algorithm Based On TextRank And Word2Vec
7	Evaluation Method Research Of Automatic Summarization Calculating The Similarity Of Text Based On HowNet
8	Research Of Multi-Documents Summarization Based On Information Extraction And Semantic Similarity
9	Research On Computing Method Of Chinese Sentence Similarity Based On Deep Learning
10	Research Of Multiple Emails Automatic Summarization