Font Size: a A A

Sentence Similarity Computing Based On Semantic Tree Kernel

Posted on:2009-10-13Degree:MasterType:Thesis
Country:ChinaCandidate:L J WangFull Text:PDF
GTID:2178360272470382Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Sentence similarity computing has been widely used in the field of Nature Language Processing,such as in Question Answering System, in the automatic digest system,in the system of information retrieval and in EBMT system.In a long time, therefore people are always fond of it.In this paper, similarity computing on various levels was studied and put the emphasis on sentences similarity computation.I thought that the complete expression of a Chinese sentence, not only depend on words, but also rely on the relationship among them. Thus I put forward to the similarity computing base on semantic Tree Kernel and make research on syntax feature, semantic feature and word feature respectively.The three features can emphasize differently, complement each other and get good results when they are used for similarity computing.First of all, Tree Kernel will be used in syntactic structure similarity calculation of Chinese sentences. The most straight form of nested structure of sentence is tree structure, which can effectively represente syntax information. In addition, the similarity of the two structures not only in a single branch of the syntactic structure, but also reflected in overall structure of sentence when we make the comparison of examples and candidates sentences. Tree Kernel can accurately match the syntax of two sentences.Secondly, this paper also focuses on the semantic similarity calculating of term in a sentence. According to synonymous dictionary, we can extract keywords from two sentences separately, eliminate redundant information and compute the semantic similarity of them.Thirdly, this paper involves word similarity of two sentences, that is, same words similarity, which is measured by amounts of same words in the two sentences.Last but not least, I suggest a way for merging the three features by multi-feature merging method. Syntax feature, Semantic feature and Same words feature represent the Structure information, Semantic information and surface information of sentence respectively. Modulate their contributions to similarity of sentence according to setting the weight of each feature.There are 6000 sentences in this experimental test set, and 5000 of them are noisy set, and another 1000 is obtained by hand, forming standard set. Applying the method in this paper on the test set, we can get the precision of 91.3%.
Keywords/Search Tags:Natural Language Processing, Similarity, Multi-features, Tree Kernel, Weight
PDF Full Text Request
Related items