Font Size: a A A

Research Of Automatic Summarization Based On Named Entity

Posted on:2010-07-26Degree:MasterType:Thesis
Country:ChinaCandidate:D AnFull Text:PDF
GTID:2178360278460856Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Along with the Internet popularization, the network has become a huge information resource. While the huge amounts of information provides us with the facilitate information, it also brings us the effective information getting problem. For some important news, too much latest news is reported in many web sets, it is good for information accessing, but at the same time, facing to tens of thousands of news documents, which are same or similar in meaning, it is difficulty for us to get the main idea of these news documents, it would take us too much time and energy to read and analyze.Automatic summarization abstracts simple and important information from documents on a specific subject. News automatic summarization is an application of multi-document summarization, which can help us to grasp the news'general quickly. There are there main difficulties in multi-document summarization, they are sentences for summarization selecting, redundancy excluding, and sentences ranking. A method of news automatic summarization based on named entity is proposed in this article. In this method, we find out these important news factors according to identifying and counting the named entities in news documents, we exclude the redundant sentences according to calculating their similarities, and then we rank these selected sentences according to the time information in them.System of automatic summarization based on named entity is a realization of the new method which is proposed in this article. In this system, the first of all, named entities, such as time, location, person and organization, which present important factors of news, are identified and picked out from the news documents, and then they are counted. The weights of each sentence are calculated according to the frequency of the named entities, the position of the sentence and the length of the sentence. Several sentences are chosen according to their weights as a preliminary set of sentences for summarization. At this time, the redundant sentences will be excluded by calculating their similarities between each other. Finally sentences are sorted by time information in them. Experimental results indicated that the new system was effective and applicable in practice.
Keywords/Search Tags:Automatic Summarization, Multi-Document Summarization, Named Entity, Vector Space Model, Sentences'Similarity
PDF Full Text Request
Related items