Sentence Similarity Computation Based On Relation Vector Model And Research On Automatic Summarization

Posted on:2015-02-12

Degree:Master

Type:Thesis

Country:China

Candidate:Y M Yin

Full Text:PDF

GTID:2268330428461663

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Sentence Similarity computation is very important in all fields of Natural Language Process. Some of the traditional algorithm only compare sentences based on their surface form such as same words, sentence length, word order and do not consider the sentence deep-level semantic information, some methods considered the sentence semantics get an unsatisfactory performance on the algorithm practicality. Therefore, a relation vetor model which taking into account the relationship of sentence structure and semantic information based on space vector model is presented, this model is composed of a mix between the key words of the sentence and the key words synonymous information, which reflects local structural component of the sentence as well as the correlation between the local structure and therefore better reflect the structure and semantics of the sentence. An algorithm of sentence similarity based on relation vector model is put forward.The algorithm is applied to the network news summary generation algorithm in order to avoid redundancy. The experiments show that, compared with the algorithm which considers the word order and semantic, relation vector model algorithm not only improves the accuracy of sentence similarity calculation, the time complexity of calculation is also reduced.Automatic text summarization studies the automatic way of obtaining summaries from natural languge text.The summaries should contain the core of the articles or contents users interested in, and output them with a coherent semantic paragraphs or chapters. Currently,abstract based on understanding is based on the knowledge of the whole article, itâ€™s applied to some narrow fields due to the limits of professional knowledge. Mechanical abstract based on statistics selects several sentences according to the external feature of articles,some systems obtained practical applicatkon,but the quality of the abstract is not stable, sentences are lack of coherence and sometimes there are abstract redundancy. This paper introduces the concept of hot words derived from the internet and puts forward an automatic summarization system based on the weight of hot words and sentence features. The system gets hot words from article according to the hot words dictionary,and then normalizes the properties of the hot words,including length,frequency and index. Then, for each sentence of the article, weighting the sentences according to the fitting function. Meanwhile, to use the effective information in the title, this paper presents a method for determining the type of the title, and modifies the sentence weight according to the judgment result. After calculating all sentence weights, the system selects some sentences to form the crude abstract according to sentence weight and abstract lengthhFinally, Eliminate the anaphora and redundancy of the curde abstract to form the final abstract according to the sentence order. The experiments show that,the system increases the precision rate and recall rate and has certain practicality.

Keywords/Search Tags:

relation vector model, sentence similarity, hot words, automaticsummarization

PDF Full Text Request

Related items

1	Automatic Recognition Of Chinese Compound Sentence Relation Words Based On Neural Network And Sentence Feature Fusion
2	Chinese-Old Bilingual Text And Sentence Similarity Calculation Research
3	The Design And Implementation Of Multi-features Combination In Sentence Similarity Computation
4	Analysis Of Hierarchical Structure In The Marked Compound Sentences Based On Collocation Of Relation Words
5	Automatic Recognition Of Relation Words In Chinese Complex Sentence Based On Decision Tree
6	Automatic Establishment Of The Hierarchies Of The Dependency Relation Of Chinese Compound Sentence Based On Collocation Of Relation Words
7	An Automatic Recognition Method Of Chinese Relation Words In Compoundsentences Based On Dependency Tree Similarity
8	Hybrid Sentence Similarity Research Based On Semantic
9	Research On The Rule Excavation Method Based On Decision Tree In Automatic Identification Of Relation Words In Chinese Compound Sentences
10	Automatic Abstracting Based On Semantic Web