Font Size: a A A

Sentence Similarity Computation Based On Relation Vector Model And Research On Automatic Summarization

Posted on:2015-02-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y M YinFull Text:PDF
GTID:2268330428461663Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Sentence Similarity computation is very important in all fields of Natural Language Process. Some of the traditional algorithm only compare sentences based on their surface form such as same words, sentence length, word order and do not consider the sentence deep-level semantic information, some methods considered the sentence semantics get an unsatisfactory performance on the algorithm practicality. Therefore, a relation vetor model which taking into account the relationship of sentence structure and semantic information based on space vector model is presented, this model is composed of a mix between the key words of the sentence and the key words synonymous information, which reflects local structural component of the sentence as well as the correlation between the local structure and therefore better reflect the structure and semantics of the sentence. An algorithm of sentence similarity based on relation vector model is put forward.The algorithm is applied to the network news summary generation algorithm in order to avoid redundancy. The experiments show that, compared with the algorithm which considers the word order and semantic, relation vector model algorithm not only improves the accuracy of sentence similarity calculation, the time complexity of calculation is also reduced.Automatic text summarization studies the automatic way of obtaining summaries from natural languge text.The summaries should contain the core of the articles or contents users interested in, and output them with a coherent semantic paragraphs or chapters. Currently,abstract based on understanding is based on the knowledge of the whole article, it’s applied to some narrow fields due to the limits of professional knowledge. Mechanical abstract based on statistics selects several sentences according to the external feature of articles,some systems obtained practical applicatkon,but the quality of the abstract is not stable, sentences are lack of coherence and sometimes there are abstract redundancy. This paper introduces the concept of hot words derived from the internet and puts forward an automatic summarization system based on the weight of hot words and sentence features. The system gets hot words from article according to the hot words dictionary,and then normalizes the properties of the hot words,including length,frequency and index. Then, for each sentence of the article, weighting the sentences according to the fitting function. Meanwhile, to use the effective information in the title, this paper presents a method for determining the type of the title, and modifies the sentence weight according to the judgment result. After calculating all sentence weights, the system selects some sentences to form the crude abstract according to sentence weight and abstract lengthhFinally, Eliminate the anaphora and redundancy of the curde abstract to form the final abstract according to the sentence order. The experiments show that,the system increases the precision rate and recall rate and has certain practicality.
Keywords/Search Tags:relation vector model, sentence similarity, hot words, automaticsummarization
PDF Full Text Request
Related items