Font Size: a A A

Research On Semantic Textual Similarity

Posted on:2015-07-27Degree:MasterType:Thesis
Country:ChinaCandidate:S WangFull Text:PDF
GTID:2308330461484953Subject:Systems Engineering
Abstract/Summary:PDF Full Text Request
In the era of information explosion, the information, to which we expose ourselves, will grow exponentially. In order to provide people with quick and convenient information, application fields of Semantic Textual Similarity (STS) extend continuously on the various aspects of Natural Language Processing, such as Information Retrieval, Question Answering, Machine Translation, etc. And the performance of STS model directly affects the quality of Natural Language Processing systems.Mainly based on FrameNet resource, this thesis analyzes the data from news, videos, glosses and Machine Translation evaluation aiming at STS task. It measures the similarity between two text snippets from word overlap, syntactic and semantic by using Linear Interpolation Model (LIM).The contents of this research are listed:(1) Propose a textual similarity model based on FrameNet resource, WordNet thesaurus and Vector Space Model (VSM); and use LIM to integrate those three basic models. The average Pearson product-moment correlation coefficient of this LIM-based model integrating FrameNet information is marked as 0.5458.(2) By comparing the similarity between LIM-based model which contains deep syntactic semantic relations and the tree kernel textual similarity model merely with the syntactic information, we found that the STS model with deep syntactic semantic relations is more steadier in all data.The major contribution of this thesis is that FrameNet resource is introduced for computing the similarity of two English texts, and the difference between deep syntactic semantic relations and syntactical relations is analyzed by comparing the similarity with tree kernel STS model. These conclusions would be the evidence for future research on open-end and large scale STS algorithm. Besides, the LIM-based model ranked 14th out of 89 submitted textual similarity model, especially with 3rd in SMT dataset by the official results on 2013 STS task.
Keywords/Search Tags:Semantic Textual Similariyt, FrameNet, Vector Space Model, WordNet, Tree Kernel
PDF Full Text Request
Related items