Font Size: a A A

Research On Key Techniques Of Query-focused Multi-document Summarization

Posted on:2009-09-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:L ZhaoFull Text:PDF
GTID:1118360272958840Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the quick development of Internet and increasing amount of text information, the requirement of searching from large amount of texts to get useful information has made automatic summarization more and more important. Automatic summarization means summarizing from single or multiple documents to get generalized content automatically. It can save much time for the users when browsing. This task is related to multiple aspects in the area of natural language processing, which is a big challenge for the computer. We described our research work on the technique of automatic summarization in this thesis.We have done much work on query-focused multi-document summarization and automatic evaluation of summary coherence. We have realized several summarization systems on the basis of participation in the DUC evaluation in recent years.We use CME model for machine learning based automatic summarizer. Furthermore, in order to find the semantic relatedness between sentences and the queries, we proposed a method of semantic extension which is applied to the summarization system. In this method, sentence vectors can be semantically extended based on the Synset and different word relations defined in WordNet. In this way, semantic information can be combined into the sentences and the performance of the summarization system gets obvious improvement.We also proposed a method of query expansion based on graph-based ranking algorithm, which is combined into the query-focused summarization system to solve the problem of information paucity in the original query. This method makes use of context information to expand the query, which can obtain more relevant information with less noise. The summarization system with query expansion has obtained significant performance improvement compared to without expansion, and we have achieved the state-of-the-art performance on the evaluation data from DUC.Another important problem is the summary evaluation. Currently the evaluation on linguistic quality relies on manual evaluation, which is time-consuming, so it is important to develop automatic method. We have studied the entity-based coherence model and improved it from both feature calculation and entity selection. In both ways we have improved the base model and got higher accuracy in the experiments.
Keywords/Search Tags:automatic summarization, natural language processing, machine learning, summary evalution, text coherence
PDF Full Text Request
Related items