Sentence Similarity Computing Based On Semantic Tree Kernel

Posted on:2009-10-13

Degree:Master

Type:Thesis

Country:China

Candidate:L J Wang

Full Text:PDF

GTID:2178360272470382

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Sentence similarity computing has been widely used in the field of Nature Language Processing,such as in Question Answering System, in the automatic digest system,in the system of information retrieval and in EBMT system.In a long time, therefore people are always fond of it.In this paper, similarity computing on various levels was studied and put the emphasis on sentences similarity computation.I thought that the complete expression of a Chinese sentence, not only depend on words, but also rely on the relationship among them. Thus I put forward to the similarity computing base on semantic Tree Kernel and make research on syntax feature, semantic feature and word feature respectively.The three features can emphasize differently, complement each other and get good results when they are used for similarity computing.First of all, Tree Kernel will be used in syntactic structure similarity calculation of Chinese sentences. The most straight form of nested structure of sentence is tree structure, which can effectively represente syntax information. In addition, the similarity of the two structures not only in a single branch of the syntactic structure, but also reflected in overall structure of sentence when we make the comparison of examples and candidates sentences. Tree Kernel can accurately match the syntax of two sentences.Secondly, this paper also focuses on the semantic similarity calculating of term in a sentence. According to synonymous dictionary, we can extract keywords from two sentences separately, eliminate redundant information and compute the semantic similarity of them.Thirdly, this paper involves word similarity of two sentences, that is, same words similarity, which is measured by amounts of same words in the two sentences.Last but not least, I suggest a way for merging the three features by multi-feature merging method. Syntax feature, Semantic feature and Same words feature represent the Structure information, Semantic information and surface information of sentence respectively. Modulate their contributions to similarity of sentence according to setting the weight of each feature.There are 6000 sentences in this experimental test set, and 5000 of them are noisy set, and another 1000 is obtained by hand, forming standard set. Applying the method in this paper on the test set, we can get the precision of 91.3%.

Keywords/Search Tags:

Natural Language Processing, Similarity, Multi-features, Tree Kernel, Weight

PDF Full Text Request

Related items

1	A Sentence Representation Method Based On Syntax And Semantic
2	The Research On Measuring Text Similarity Based On Word Vector Enhanced Tree Kernel Model
3	Sentence Similarity Computing Combining Multi-features Based On HowNet
4	Research And Implementation Of Subjective Automated Assessment System Based On Natural Language Processing
5	Chinese Sentences Similarity Computation And Its Application In Question-Answering System
6	Research On A Method Of Calculating Sentence Similarity By Comprehending Multi-level Information
7	Research On Machine Learning For Natural Language Processing And Transmission
8	The Design And Implementation Of Legal Service System Based On Natural Language Processing
9	Research On Dependency Tree Kernel-based Semantic Role Labeling
10	The Research Of Semantic Similarity Computing Algorithm Based On HowNet