Font Size: a A A

Hybrid Sentence Similarity Research Based On Semantic

Posted on:2017-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:Q LiFull Text:PDF
GTID:2428330488973274Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of modern science and technology,people more and more high demand for intelligent,thus promoting the development of artificial intelligence.Ago,artificial intelligence is mainly for the computer to do something whose logic is relatively fixed and amount of calculation is relatively large,which will make people free from the simple labor.Now,people are not satisfied with the simple intelligence operations,then want to make the computer to think something like people's way of thinking and conversate with human by the natural language,which requires that the computer must understand human language at first.And the text of human language often is in the form of sentence stored in the computer,which requires the computer can understand the single sentence.Among them,the sentence similarity computing is one important aspect of natural language processing.According to the characteristics of complicated structure,polysemy,a righteous word in Chinese text,this article puts forward the hybrid sentence similarity calculation method based on semantic.This method gives an analyzing and calculation of text from three aspects,the following three aspects:the sentence structure,syntactic features and emotional words.1)sentence structure:including sentence length difference,the number of the same words,position changes of same words,edits of mutual transformation between two text etc..When calculating the similarity of text structure,text surface structure characteristics should be overall consideration,the change of each structure on semantic degree have different effects.By calculating and weighting sum of the similarity value of each case to get the surface structure similarity.2)syntactic features:sentence segmentation processing which can make the sentence become the composition of words should be put in the first place.The original structured sentence turns into numbers of separate non-structural words.Then,through syntactic analysis and dependent relationship between words,these words without the structure are converted the simple sentences whose structure is "subject+predicate+object+ preposition phrase".At last,the result of sentence similarity is gotten by calculating every simple sentence.At the same time,the displacement of each structural component should be considered in the calculation of similarity of each pair of sentences with "subject+predicate+object+prepositional phrase"structure.The paper further fuses component information,eventually,the sentence similarity is calculated by calculating the similarity of subject and subject,predicate and predicate and object and the object,prepositional phrase and prepositional phrase.3)adjunct words:This paper divides adjunct words into three types:positive emotional words,negative emotional words and degree level words.In these three kinds of adjunct words,degree adverbs are certain priorities on the effects of text semantic.The similarity of sentence which is blended in degree adverbs is still higher than a certain threshold,positive emotional words and negative emotional words should be considered in the influence of text similarity.When calculating the sentence similarity,this paper firstly calculates the sentence similarity based on the structure,and calculates the sentence similarity based on syntactic characteristics by calculating the similarity among words,then calculates the sentence similarity based on adjunct words on the basis of the similarity of syntactic characteristics.At last,the sentence similarity is gotten by weighted summarizing the similarity of the structure and syntactic characteristics.This paper evaluates the superiority of similarity calculation method through the Pearson correlation coefficient,the accuracy,recall rate and F values.Similarity results that calculated based on the structure,the syntactic characteristics through the experiment,are:0.42,0.54,0.83 and 0.89.The result of hybrid semantic sentence similarity calculation based on semantic,whose accuracy,recall rate and F values are:84.85%,93.33% and 88.89%.
Keywords/Search Tags:Words Similarity, Syntactic Analysis, Dependency Syntax, Syntactic Features, Adjunct Words
PDF Full Text Request
Related items