Text Semantic Similarity Algorithm Based On Transformer

Posted on:2021-04-04

Degree:Master

Type:Thesis

Country:China

Candidate:M F Zhao

Full Text:PDF

GTID:2428330614453816

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Semantic similarity is the core problem of natural language processing tasks,and it plays a very important role in many problems,such as web page retrieval,automatic scoring,text classification,automatic question answering,and natural language generation.With the rapid development of computer and Internet technologies,automatic grading has gradually come into people's field of vision.The automatic review of objective questions has become increasingly mature.Subjective questions with relatively large variables can be faced.Using traditional semantic similarity calculation methods often Ignore the important points in the answer,missing the score points leads to inaccurate scoring.In response to this problem,we propose a Transformer-DSSM model for semantic similarity calculation to improve the accuracy of automatic scoring of subjective questions.Generally speaking,short texts with complete semantics will be more accurate in determining semantic similarity.Therefore,before the semantic similarity calculation is performed in this paper,the long text of the answer is converted into multiple by the Semantic Integrity Analysis method.The semantically complete Chinese short text is used as a data set for the calculation of semantic similarity.The calculation of semantic similarity generally involves four steps: word segmentation,word vector representation,feature extraction,and similarity calculation.In the process of word vector representation,Position Embedding is introduced in this paper.The GRU network is used to encode the positions where the words appear in the sequence,which can better obtain the context features of the words.Then input the feature vector and position encoding into the network based on the Transformer encoding layer for feature extraction,and in turn pass through the encoding layer composed of the attention layer and the feedforward neural network.In order to prevent overfitting the entire network,the residual network and regression are used.Make adjustments.After extracting the features of each word,Global-Attention is introduced to calculate the feature vector representation of each sentence.After obtaining the feature vector representation of the two sentences,the Attention-over-Attention layer is introduced to extract the interaction information of the two sentences,and finally the cosine similarity of the two sentences is calculated based on the comprehensive feature vector.Compared with the traditional WMD algorithm,CBOW,DSSM,CDSSM,LSTM-DSSM and so on,the automatic scoring of subjective questions for political topics improves the accuracy of this model.The experimental results show that the Transformer-DSSM under the semantic integrity analysis proposed in this paper is more accurate than traditional methods in terms of semantic similarity calculation.

Keywords/Search Tags:

Natural Language Processing, Semantic Integrity, Semantic Similarity, Transformer, Attention, Deep Structured Semantic Model

PDF Full Text Request

Related items

1	Investigation Of Categorical Semantic Information Processing In The Brain And Natural Language Processing Models
2	Research On Microblog Summarization Using Paragraph Vector And Semantic Structure
3	Semantic Annotation For Documents In Professional Domain Based On NLP
4	Study On Concept Semantic Similarity Measure Based On Ontology
5	The Representation Of Chinese Semantic Knowledge And Its Application In The Chinese-English MT System
6	The Research Of Law Support System Based On Semantic Computing
7	Research And Implementation On Computing Semantic Relatedness Using Chinese Wikipedia
8	Research And Application Of Short Text Semantic Similarity Model Based On Deep Learning
9	Crowdsourcing For Synonyms Proofreading And Acquisition In Chinese Large-scale Semantic Knowledge Base
10	Research Of Image Emotional Semantic Retrieval Based On Natural Language Processing