Chinese And English Automatic Summarization Based On Topic Modeling

Posted on:2012-06-04

Degree:Master

Type:Thesis

Country:China

Candidate:M H Zhang

Full Text:PDF

GTID:2218330368991829

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of the computer technology and the Internet, various in formation is increasing explosively; people's demand for precisely locating information give a strong impetus to the research in the natural language processing technology (NLP). Meanwhile, as the progressively research on cross-document information fusion technologies, multi-document summarization become a hot research subject, multi-document summarization can be used in question answering, search engines, topic detection and other applications.In this paper, we analysis the existing methods on automatic multi-document summarization deeply, and apply a topic model to the sentence silence detection. In addition, we use a dynamic model to control redundancy. At last, we implement an automatic multi-document summarization system based on those methods. Experimental results on TAC2008 and TAC2009 corpus show that the system has a good Rouge performance.This paper mainly analysis the most two key multi-document summarization technologies:Sentence salience determination and redundancy control. In terms of sentence salience determination, we propose a sentence topic feature based on topic modeling. The results show that the topics feature plays a significant role in the MDS. And the combination of topic feature and other traditional features can also improve the system performance. In terms of redundancy control, we use dynamic modeling to control redundancies; we also design the update dynamic modeling for the update summarization task based on this. After using the update dynamic modeling, the summary can effectively avoid history redundancies. The results of TAC2008 corpus show that after combined the two strategies (Sentence salience determination and redundancy control) we can achieve a better system performance. Especially in the update summarization task, our result is better than the best result in the entry system. Finally, this paper also gives the evaluation of Chinese corpus before and after joining topic model and dynamic model. The result shows that topic modeling and dynamic model have equally effective on the Chinese corpus. However, the result of Chinese MDS is obviously worse than the one of English MDS, and the reason may be that the Chinese corpus needs more preprocessing which can affect the performance of the whole system.

Keywords/Search Tags:

topic modeling, Multi-document summarization, latent Dirichlet allocation, natural language processing

PDF Full Text Request

Related items

1	Sentence Extraction For Multi-Document Summarization Based On Topic Model And Semantics
2	Study On Multi-Document Summarization Algorithm Based On Fusing Topic Sentences Semantic
3	Research On Multi-Document Summarization Based On Topic Modeling And Semantic Analysis
4	Multi-document Summarization Based On HLDA Hierarchical Topic Model
5	The Study On Dynamical Topic Modeling And Text Summarization For Web Forums
6	Research On Hierarchical Topic Modeling Method For Multi-Document Summarization
7	Research And Implementation Of Document Summarization Based On Combined Multi-Feature
8	The Research Of Topic Based Multi-document Summarization
9	Research On Text Retrieval Based On Topic Analysis
10	Research On Generation Method Of Evolutionary Multi-document Summarization Based On Sub-topic Enhancement