Font Size: a A A

The Research And Implementation Of English Automatic Summarization

Posted on:2015-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:R X PengFull Text:PDF
GTID:2255330428972920Subject:Modern educational technology
Abstract/Summary:PDF Full Text Request
The rapid development of science technology and Internet leads to the information explosion. Either people lost in the sea of information, or spend a lot of time to find the information they need. In our time when efficiency is greatly valued, automatic abstraction receives great attention as it can help people to glean among a sea of information a concise text which contains the original important content. Automatic summarization covers a lot of theoretical knowledge and application technology, is an important research direction of Natural Language Processing field.Similarity measure plays a key role in the automatic summarization system. Similarity measure is a technique often used in data mining, generally including lexical similarity、similarity between sentences and similarity between literatures, and so on. In this passage, certain links do exist between sentences and sentences, paragraphs and paragraphs. By the links, the importance of the sentences or paragraphs in the passage can be inferred, which can be important indicator of abstraction sentence. the abstract is generated through the arrangement of the matching results of abstract sentence similarity, followed by the export of them according to weight order. At the same time, the quality evaluation of the summarization also need use similarity measure.Similarity measure method used in this paper combines the summarization’s characteristics, based on the former algorithm, a new similarity measure method that based on LDA is proposed. Combined with this similarity measure method, a LDA sentence descending algorithm for English automatic summarization is designed. An experiment is illustrated on DUC data and the results prove the proposed measure and algorithm effective and well performed.The main research contents of this paper are as follows:Firstly, an analysis of the current research of automatic summarization at home and abroad, an introduction of the definition and classification of automatic summarization are presented. Besides, it also researches into the current techniques of automatic summarization and its evaluation methods, and gives a summary and classification of the techniques. Secondly, it analyzes the current measure methods of current sentence similarity in a comprehensive manner and puts forward a textual similarity measure method based on Latent Dirichlet Allocation. This method builds the topic space model through the Latent Dirichlet Allocation, the word、sentence、document and corpus are represented as vectors in the same topic space. It is a suitable method for measuring the similarity between summary sentences.Thirdly, a new English automatic summarization method combined with fore-mentioned similarity measure method is presented, that is the LDA sentence descending algorithm, the procedures of this idea is as follows: evaluation of the importance of each sentence by means of the said similarity calculation method, removal of unimportant sentences until the abstract length reaches its limit.
Keywords/Search Tags:Automatic Summarization, LDA, Sentence Similarity, Topic SpaceModel
PDF Full Text Request
Related items