Font Size: a A A

Research On Sentence Intention Matching Methods

Posted on:2022-02-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:X LiuFull Text:PDF
GTID:1488306569485224Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The sentence intention matching method is the core technology in automatic question answering and information retrieval systems,which aims to judge the semantic matching degree of two sentences,and it has shown important value in the practical applications.In recent years,with the development of deep learning,the related researches have made great breakthroughs.Although deep learning methods have been widely used in the research of sentence intention matching methods,they still face the following challenges: 1)insufficient training samples,many practical applications cannot provide enough training samples;2)the limited ability of encoding efficient intention information in sentences for the current models because of the flexible expression of sentences;3)the limited improvement of the sentence intention matching methods since the ambiguity in the expression of words has not attracted enough attention;4)the unsuitability of the general intention matching methods in certain scenarios since the matching pairs are complex in different scenarios.To overcome the above challenges,this paper conducts researches on the sentence intention matching methods.The main research content includes the following aspects:First,to solve the problem of insufficient training samples,the paper proposed a largescale Chinese question intention matching corpus constructing method.In this method,we first used a search engine to collect large-scale question pairs,which are derived from the search results of high-frequency words in multiple domains.We second used the unsupervised Wasserstein-based distance algorithm to filter irrelevant question pairs.At last,we recruited the annotators to manually annotate the remaining question pairs with the label of whether two questions share the same intention,and obtained the question intention matching corpus containing 260 068 question pairs.Simultaneously,the corpus is divided into the training set,validation set and test set.Several well-known sentence intention matching algorithms are used for experiments.The experimental results not only proved the excellent quality of the corpus,but also provided reliable baseline performance for further research on the corpus.Second,to solve the problem of limited encoding ability of intention information in sentences,we proposed a sentence semantic difference-based intention matching method.In this method,we first extracted the lexical differences between two sentences.We second used the neural language model to encode the lexical differences to obtain the semantic difference feature representation,and then integrated the semantic difference feature representation into the existing intention matching method through a gate mechanism.The method is validated on the large-scale Chinese question intention matching corpus and English question intention matching corpus.Experimental results show that the method can effectively learn the intention information in the sentence and improve the performance of intention matching,which is the new state-of-the-art compared to the published benchmark methods.Third,to solve the problem of ambiguity in the word representation of semantic information,we proposed a word vector decomposing method to learn word sense for intention matching.In this method,a polysemous word in the sentence is represented by a pre-trained unsupervised word vector,and the word vector is passed into the capsule neural network.The capsule neural network decomposes the word vector to obtain multiple sememe-like vectors.Secondly,the neural language model is used to encode the sentence to obtain the contextual information representation.Finally,the attention mechanism is used to integrate the contextual information representation with multiple sememe-like vectors to generate a specific contextual word sense vector.When training the method,the word sense matching training method is used to explicitly learn the word sense.The learned word sense vector was applied to the English question intention matching corpus for validation.The experimental results show that the word sense vector obtained by this method can accurately capture the semantics of the word sense compared with the unsupervised word vector,and further improves the performance of the sentence intention matching method.Fourth,to solve the problem of the unsuitability of the general intention matching methods in certain scenarios because of the complexity of the matching pairs,we proposed an intention-based medical knowledge and literature matching method.The method is based on the intention information in the medical knowledge and literature.We used the relation and topic capsule networks to learn the relation features and the topic features in medical literature respectively,and integrates the learned relation and topic information as the intention information into the matching algorithm.This method is validated on medical literature retrieval task using manually labeled matching test set and ranking test set.The experimental results show that this method is better than the published benchmark methods in various evaluation metrics.The results also prove the intention information in the medical knowledge and literature is effective,and it can enhance the ability of the matching methods in medical literature retrieval scenario.In summary,this dissertation focuses on in-depth research and discussion on the sentence intention matching methods.For the existing four challenges in the sentence intention matching methods,this dissertation propose a large-scale intention matching corpus constructing method,a sentence semantic difference-based intention matching method,a word vector decomposing method to learn word sense for intention matching,and an intention-based medical knowledge and literature matching method,respectively.The methods are validated on their corresponding dataset and achieved promising results.
Keywords/Search Tags:Intention Matching Corpus, Intention Matching Methods, Word Sense Disambiguation, Medical Literature Retrieval
PDF Full Text Request
Related items