The Study Of Measures And Applications Of Short Text Semantic Similarity

Posted on:2015-01-29

Degree:Master

Type:Thesis

Country:China

Candidate:T T Zhu

Full Text:PDF

GTID:2268330431959087

Subject:Computer application technology

Abstract/Summary:

Text semantic similarity measures the degree of semantic equivalence between two texts, which plays an important role in natural language processing (NLP), and is a basis of many downstream applications.Previous research work have proposed many kinds of similarity measuring features, and proved that using multiple kinds of features achieves better results than using single kind of features. Therefore, the main work of this paper is to propose and combine more diverse similarity measurement features, which are expected to contain more complete text information and to improve the performance of short text similarity measuring model.We first present a sentence level short text similarity measuring model by combining diverse similarity measuring features. This model combines7different kinds of text similarity measuring features, i.e., string features, knowledge based features, corpus based features, syntactic features, machine translation based features, multi-level text features and other features, and our feature set is also the most complete currently. Then a supervised machine learning-based regression algorithm is used to build the model. The experimental results showed that combining diverse similarity measuring features improves the performance of short text similarity measuring model.Previous work seldom focused on cross-level semantic similarity. The second work of this paper is to extend the short text similarity work from sentence level to cross level with the aid of a latest released benchmark dataset regarding cross level text similarity measurement. Specifically, we build four similarity measuring models on four cross levels, i.e., paragraph-sentence level, sentence-phrase level, phrase-word level and word-sense level, respectively. The experimental results on corresponding datasets show that the performances of models decrease as the levels of texts decrease from long texts to word. The possible reason is the more information the long text contains, the better performance the model gets, and vice versa. To address the missing information problem in phrase and word, we propose a new method to extend the information with the aid of WordNet. The experimental results proved that our proposed information extending method improves the performance.To validate the effects of our proposed short text similarity measurement model, we applied it to two NLP tasks: paraphrase recognition and text entailment. The experimental results on paraphrase recognition is good, which means our model is able to serve for this task. However, the result of text entailment is much worse than our expectation but still can serve as a baseline for the text entailment task.

Keywords/Search Tags:

short text semantic similarity, cross level text similarity, similarity features, machine learning, regression algorithm

Related items

1	Research And Application Of Short Text Semantic Similarity Model Based On Deep Learning
2	A Short Texts Matching Methodusing Multi-level Features
3	Research On Short Text Semantic Similarity Computation
4	The Research And Application Of Unsupervised And Supervised Short Text Similarity Measure
5	Research On The Calculation Method Of Han-Thai Bilingual News Text Similarity With News Elements
6	Research And Implementation Of Semantic Similarity Computing By Combining Knowledge-based And Corpus-based Methods
7	Research On Semantic Similarity Measurement For Text
8	Clustering Algorithm Research Of Short Text Based On Semantic Similarity
9	Chinese-Old Bilingual Text And Sentence Similarity Calculation Research
10	Key Technology Research On Short Text Similarity