Font Size: a A A

Research On EBM Multi-Document Summarization Technique

Posted on:2011-09-04Degree:MasterType:Thesis
Country:ChinaCandidate:J Z XieFull Text:PDF
GTID:2178360302499163Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of network technology and the explosive growth of text information, how to efficiently and effectively acquire valuable information from the immense ocean of medical literatures has been a delicate problem. The automatic summarization technique of medical articles plays a significant role to cope with the problem in the text mining community.After deeply researching on EBM and automatic summarization technique, we propose a summarization method, which combines the EBM-domain knowledge with some shallow textual features and semantic features. According to this architecture we design and implement a prototype system for EBM Multi-Document summarization. This system consists of several functional modules:document preprocessing, feature selection and sentence extraction, redundancy elimination and sentence reordering module. Some key techniques have been discussed in this paper:(1) To deal with the ambiguous word and synonymy problem, we propose an algorithm of word disambiguation and synonymies'merging based on the Medical Subject Headings(MeSH) and semantic dictionary(WordNet). This method effectively & efficiently improves the recognition accuracy of the feature items.(2) In the module of sentence extraction, we propose two kinds of methods, one of them integrates several shallow textual features such as word frequency, location, clue words, indicative phrases and sentence entity density to calculate the score of weight for each sentence, another is based on semantic features. We define a new feature called "Sentence Entity Density " which can effectively eliminate the influence of sentence length to the sentence weight.(3) In this paper we combine a kind of simplified MMR-MD technique with the sentence similarity algorithm based on semantic analysis to reduce redundancy.(4) In order to enhance the coherence and readability of the summary, we propose a reordering strategy of three different priority rankings which are listed from the top down:the timestamp information of the sentence,the position proportion of the sentence in original document, the evidence level of the medical literature which the sentence belongs to.(5) In the system evaluation section, we adopt two ways to evaluate our system, one is the existing and popular summarization evaluation method(ROUGE), another is an evaluation method for specific domain(here is EBM) summarization system in which we combine the EBM Domain Words coverage with the non-domain words coverage with different weight. The experimental results from the two evaluation methods show that the automatic summaries'coverage can reach a higher level.
Keywords/Search Tags:Multi-Document summarization, EBM, feature selection, sentence extraction, system evaluation
PDF Full Text Request
Related items