Font Size: a A A

Research On Multi-document Automatic Summarization Of Online News

Posted on:2012-07-07Degree:MasterType:Thesis
Country:ChinaCandidate:X Y XuFull Text:PDF
GTID:2218330371462641Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
The growing popularity of the Internet and the continuous development of the computer technology have brought convenience for people to receive information. However, how to obtain interesting information and useful knowledge from massive network data environment is still a serious problem that is urgent to be solved. Among many research methods, multi-document summarization is considered to be one of effective tools to resolve this problem. Multi-document summarization is a natural language processing technology, which uses the computer to extract the main concepts of multi documents under the same topic into a short text by information compressing technic, and has been successfully applied to both military and civil fields. This paper studies the technologies of online news multi-document automatic summarization.Concerned events from news topic are extracted, different measures are used to organize them, and a summary is obtained. Research contributions of the thesis are listed as follows:(1) Recognition of time expression is studied, and CRFs combining with user-defined rules based a time expression recognition method is proposed. Aiming at the shortage of traditional recognition methods singularity and application fields limitation, CRFs is used to primarily recognize time expression, then user-defined rules are performed to revise the error and missing time expression. Experimental results show that the proposed method improves the precision and recall of time expression recognition effectively and establishes an elasticity analysis model for time expression recognition.(2) Event extraction technology is considered, and a news text event extraction method driven by event sample is put forward. Aiming at the positive and negative samples imbalance and data sparseness problems resulted from event trigger-driven or argument-driven, event sample is adopted to drive, then the idea of clustering is introduced to complete event extraction from online news documents effectively, which breaks the limitation on the event categories of traditional methods. Experimental results indicate that the designed method improves the performance of event extraction, and is an effective method for event extraction.(3) Multi-document automatic summarization is studied, and an event extraction based multi-document automatic summarization method is presented. Aiming at redundancy of paragraph or sentence based multi-document automatic summarization method, event extraction technology is used to translate the original documents' into logical division based on events, then the summarization is derived through the extraction, taxis and embellishment of the major ideas. Experimental results demonstrate that the summarization obtained is close to the understanding of people, and helps people to acquire cause and effect of events timely and accurately.
Keywords/Search Tags:Online News, Conditional Random Fields, User-Defined Rules, Time Expression, Classification, Event Sample, Clustering, Event Extraction, Multi-Document Summarization
PDF Full Text Request
Related items