Font Size: a A A

Generation Of Automatic Summarization Based On Improved HMM

Posted on:2018-06-24Degree:MasterType:Thesis
Country:ChinaCandidate:C L WuFull Text:PDF
GTID:2348330512971492Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of computer technology in modern society,the electronic information and text counseling are explosive growth,and the network has become the most important way for people to obtain and transmit information.It is necessary to spend a lot of time to quickly search the contents of their own interest in a large amount of information,so the effective and simple information retrieval technology is very needed in the Internet era.The Automatic Summarization technology is a powerful tool for concentrating information.Automatic summarization of programs written for the computer need to have the following characteristics:(1)the effective expression to the original general purpose of text;(2)language should be concise and short;(3)the semantic of text should be coherence,readability and intelligibility.In this paper,we analysis the traditional ways to the generation of automatic summarization,most of which are based on the calculation of the weight of sentences to extract the relatively important sentences.Therefore,this paper applies the state transition characteristics of the hidden Markov model(HMM)to the automatic summarization,which makes the generated summarization more consistent with the context.On the basis of this,we further improve the traditional HMM model,and add the backward probability of the observation state.The results show that the accuracy of the generated summary is improved obviously.Specific work as follows:First of all,this paper uses the web news text as the object to study the relevant technical theory of automatic summarization.Because of the particularity of web news pages,this paper realizes the the combination of regular expressions and block distribution algorithm to extract text,and using the maximum matching algorithm and TF-IDF algorithm to analysis the frequency statistics of the text.Then,in-depth study of the traditional HMM,the state sequence according to the sentence order of importance as "A","B","C","D","E" to build HMM,which combines four sentence features of frequency,position,title correlation.By contrast,it is found that the summarization of the HMM can better reflect the characteristics of context than that based on the sentence weight method.Finally,the HMM model according to the characteristics of web news text is improved,the releasing of observed states of is not only relevant with the hidden t moment but t+1 moment,which further refine the probability of the abstract sentence extraction.The model of Learning Algorithm and Viterbi Algorithm is improved according to the adjustment of HMM model.To compare the quality and running time of the conventional HMM,the improved HMM is verified its feasibility.
Keywords/Search Tags:Automatic summarization, Web news page, HMM, Improved HMM
PDF Full Text Request
Related items