Font Size: a A A

Research On Short Text Semantic Similarity Computation

Posted on:2017-11-03Degree:MasterType:Thesis
Country:ChinaCandidate:K LiFull Text:PDF
GTID:2348330518470924Subject:Engineering
Abstract/Summary:PDF Full Text Request
The issue of semantic similarity problem in text mining has been paid close attention in the academic and industry areas.It has been widely studied in information retrieval,automatic question answering,text classification,natural language processing,machine learning etc.Short text semantic similarity calculation is to calculate the semantic similarity between two short texts.At present,aiming at this problem,many researchers have proposed a variety of similarity measures,mainly including word co-occurrence similarity measurement,similarity measurement based on grammatical structure and feature measurement based on semantic.Among them,method based on word co-occurrence,it doesn't work well in short text,because the length of short text is limited-method based on syntactic structure,given a certain weight to different sentence elements via syntax analysis,and then extract grammatical information of text.feature measurement based on semantic,using background knowledge to learn semantic information of words,is well suited to solving a synonym similarity calculation.However,it is lack of consistent expression framework in nonsynonymous words and words of different sentence constituents.According to the above issues and based on short text property,constructed multi-level structure and proposed the multi-level feature fusion method,obtain more complete information from text,therefore improving the accuracy of short text semantic similarity calculation.First of all,the model combines 6 different kinds of text similarity measuring features.These features include lexical features,features based corpus,grammatical features,syntactic features and diversified combination features and other features.Then;for dimensionality reduction in these various features,reduce the redundancy and noise of the text.Thirdly,study and use ensemble learning model-boosting algorithm to improve generalization of the model,training multi-classification model.Finally,through comparing with the existing methods,to validate the effects of multi-level feature fusion method proposed in this paper,and the effects of short text semantic similarity calculation results.The experimental results show that our proposed multi-level feature fusion method for short text enhance the accuracy of semantic similarity calculation effectively.
Keywords/Search Tags:short text, semantic, similarity, feature fusion, ensemble learning
PDF Full Text Request
Related items