Font Size: a A A

Chinese Sentence Similarity Computation Based On Multi-features Fusion

Posted on:2017-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y F GaoFull Text:PDF
GTID:2348330485481725Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Chinese sentence similarity computation is based on the analysis of Chinese sentence's features. According to these features set the standards of the similarity calculation, and combine these standards to calculate a specific value. For a long time, Chinese sentence similarity computation is a very important practical technology which is hot and difficult, as well as widely applied in many fields of Natural Language Processing.This paper introduces Chinese sentence similarity computation methods and the existing problems in detail. Meanwhile, it parses sentence similarity computation method of the relation vector model. According to the features of the keyword, sentence length and word order, we put forward a method of Chinese sentence similarity calculation based on multi-features fusion. This method is an upgrade of the relation vector model, by intensive studying the features of the Chinese language, and weighted by co-occurrence of adjacent words to adjust the weight of the different features. The method mainly considers keyword morphological similarity, as well as the local structure and the situation of synonyms appropriately.On the basis of comprehensive analysis of surface features and sentence structure, this paper explores and innovates the sentence similarity calculation in depth. The research conducted mainly through the following aspects:1) Analysis of Chinese sentences, it found that the features related to the sentence similarity. These features have different effects on sentence similarity, but merely these features such as:keywords, sentence length, word order, which have great impacts on sentence similarity that are selected by experiment.2) Improve the effectiveness of keywords and sentence length on relation vector model, and integrate influence factors like word order, non keyword to ameliorate the correct rate of the sentence similarity.The experimental results show that the accuracy of the proposed method is higher than the relation vector model in calculating the similarity of news headlines. Proposed similarity computation method not only in dealing with two sentence's similarity especially the large difference in length which is better than the relation vector model, but also find that in the case of including punctuation and stop words, sentence similarity computation accuracy rate is still high.
Keywords/Search Tags:Sentence similarity, Keywords, Multi-feature fusion
PDF Full Text Request
Related items