Font Size: a A A

Similar Text Discrimination Based On Siamese Network

Posted on:2021-04-08Degree:MasterType:Thesis
Country:ChinaCandidate:J WuFull Text:PDF
GTID:2518306302954189Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
In information retrieval system,after the user enters the query,the corresponding information needs to be returned quickly.In Q&A communities such as Zhihu and Baidu Knows,when users ask questions,if they can quickly match the most similar questions and return the existing answers to users,it can prevent repeated questions and improve users' experience.In the intelligent customer service or question answering system,it is necessary to accurately identify the user's intention and effectively solve the user's demands.In the above examples,the most basic and core problem is the similarity discrimination of text semantics.The intelligent customer service system based on artificial intelligence technology has been taken for granted in our life.On the one hand,it can help enterprises to save a lot of human customer service costs.On the other hand,it can also help users quickly solve some common problems and improve efficiency.However,with the increase of the number of users,the variety of appeal types,the different language habits of users,synonyms,word order transformation and other phenomena,the intelligent customer service accurate identification of user intentions is becoming more and more challenging.In this context,how to effectively distinguish whether the meaning of text is consistent has become a problem worth studying.The traditional method of discriminating text semantic similarity is to extract some artificial and elaborate natural language features,and then to distinguish them with the help of logistic regression,support vector machine,decision tree and other models.These methods are simple to model and easy to interpret.They perform well on small,specific corpora,but fail to perform well on other tasks.At the same time,when there are synonyms,polysemy,word order transformation,ellipsis and other grammatical phenomena in the text,the traditional machine learning method is difficult to effectively capture the deep semantics in the text,so it is difficult to distinguish effectively.In the face of the above problems,this paper proposes a text semantic similarity discrimination model based on Siamese Recurrent Convolutional neural network(Siamese-RCNN),based on previous studies and combined with the advantages of cyclic neural network in capturing sequence relations and convolutional neural network in capturing key information.One of the differences between this network model and previous methods is that the word vector is fine-tuned by cyclic convolutional neural network.This encoding method can simultaneously encode the text from two directions in combination with the context to make its interpretation more accurate.The subsequent modeling of word vectors after fine-tuning can improve the final results.In addition,another innovation of this paper is the integration of multi-angle interactive information in modeling.In previous methods,only one interaction is carried out when the interactive information of two texts was extracted.Based on previous studies,this paper made multiple interactions with the intermediate results in the modeling process,extracted interactive features from multiple perspectives,and fused them into the final vector containing semantic correlation of the two texts for discrimination.In addition,in order to further improve the accuracy of the model,this paper carries out multimodel fusion and combines the discriminant results of traditional methods with those of deep learning to obtain the final semantic similarity discriminant results.Through empirical analysis,it is found that the model incorporating multi-angle interaction performed well,with the AUC of the final model reaching 0.8371 and the f1-score reaching 0.5600.However,the encoding-based model does not consider the interaction information,and the AUC is only around 0.7970.The above results also indicate the importance of interaction characteristics to the discriminant results.In addition,this paper fuses the prediction results of the traditional machine learning method and the neural network model to improve the discrimination accuracy of the model to some extent.The AUC can reach about 0.8402 and the f1-score can be increased to 0.5712,indicating to some extent the promoting effect of surface text features on semantic discrimination.
Keywords/Search Tags:Text similarity, Siamese network, Recurrent convolution neural network, Attention mechanism
PDF Full Text Request
Related items