Font Size: a A A

Text Semantic Similarity Algorithm Based On Transformer

Posted on:2021-04-04Degree:MasterType:Thesis
Country:ChinaCandidate:M F ZhaoFull Text:PDF
GTID:2428330614453816Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Semantic similarity is the core problem of natural language processing tasks,and it plays a very important role in many problems,such as web page retrieval,automatic scoring,text classification,automatic question answering,and natural language generation.With the rapid development of computer and Internet technologies,automatic grading has gradually come into people's field of vision.The automatic review of objective questions has become increasingly mature.Subjective questions with relatively large variables can be faced.Using traditional semantic similarity calculation methods often Ignore the important points in the answer,missing the score points leads to inaccurate scoring.In response to this problem,we propose a Transformer-DSSM model for semantic similarity calculation to improve the accuracy of automatic scoring of subjective questions.Generally speaking,short texts with complete semantics will be more accurate in determining semantic similarity.Therefore,before the semantic similarity calculation is performed in this paper,the long text of the answer is converted into multiple by the Semantic Integrity Analysis method.The semantically complete Chinese short text is used as a data set for the calculation of semantic similarity.The calculation of semantic similarity generally involves four steps: word segmentation,word vector representation,feature extraction,and similarity calculation.In the process of word vector representation,Position Embedding is introduced in this paper.The GRU network is used to encode the positions where the words appear in the sequence,which can better obtain the context features of the words.Then input the feature vector and position encoding into the network based on the Transformer encoding layer for feature extraction,and in turn pass through the encoding layer composed of the attention layer and the feedforward neural network.In order to prevent overfitting the entire network,the residual network and regression are used.Make adjustments.After extracting the features of each word,Global-Attention is introduced to calculate the feature vector representation of each sentence.After obtaining the feature vector representation of the two sentences,the Attention-over-Attention layer is introduced to extract the interaction information of the two sentences,and finally the cosine similarity of the two sentences is calculated based on the comprehensive feature vector.Compared with the traditional WMD algorithm,CBOW,DSSM,CDSSM,LSTM-DSSM and so on,the automatic scoring of subjective questions for political topics improves the accuracy of this model.The experimental results show that the Transformer-DSSM under the semantic integrity analysis proposed in this paper is more accurate than traditional methods in terms of semantic similarity calculation.
Keywords/Search Tags:Natural Language Processing, Semantic Integrity, Semantic Similarity, Transformer, Attention, Deep Structured Semantic Model
PDF Full Text Request
Related items