Research And Implementation Of Long Text Semantic Similarity Algorithm

Posted on:2021-02-19

Degree:Master

Type:Thesis

Country:China

Candidate:J Zhang

Full Text:PDF

GTID:2428330605476055

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The rapid development of information technology and the rapid popularization of mobile terminals have promoted the transfer of information,and the growing text data has become an important source for people to understand the information.There are more and more application scenarios for text semantic similarity calculation.For short texts,the questions entered in the information retrieval query return the most relevant answers,and the intelligent customer service dialogue returns matching sentences from the back-end database based on the questions raised by the user,Long texts like paragraphs have many applications in news classification,plagiarism discrimination,automatic article scoring,and have certain research value.The development of natural language processing technology provides a method for calculating text similarity.Deep learning models have achieved good results on short text similarity tasks.However,the existing methods are not ideal for long text applications.This is because paragraphs are more complex in composition than sentences,so it is more difficult to calculate the semantic similarity of paragraphs.Through learning and summarizing the existing methods,this article takes paragraphs as an example,and uses different algorithms to calculate the semantic similarity of paragraphs from the two aspects of paragraph semantic vector representation and paragraph text summary.Paragraphs are composed of multiple sentences,and each sentence contains multiple words.Therefore,it can be considered that the semantic representation of paragraphs can be derived from the semantic representation of sentences.Based on this fact,this paper proposes a method of hierarchically constructing information representation to obtain paragraph vectors,mainly including there are word coding,word attention,sentence coding,and sentence attention.The coding uses BiLSTM,attention uses a multi-head attention mechanism,and finally uses CNN to further extract semantic features.After obtaining the vector of paragraph pairs,by calculating the cosine distance between the vectors is used as the similarity score.Compared with long short-term memory networks,the model in this paper has the following advantages:(1)Multi-head attention can extract features from multiple dimensions of sequence data,and aggregate the features of multiple dimensions as the final information representation.Calculate the semantic relevance between any two words in a sentence,which is information that the traditional attention mechanism cannot obtain;(2)Considering the role of convolutional neural networks in local feature extraction,convolutional neural networks are used to further extract local features after sentence encoding.The characteristics of high paragraph dimension and large text context span lead to increased calculation difficulty.If the paragraph dimension can be reduced,the calculation difficulty can be reduced.This article proposes a paragraph similarity algorithm based on generating abstracts.The main purpose is to automatically summarize paragraphs.It is believed that the abstract can represent the semantics of the paragraphs.In this way,the similarity between paragraphs is converted into the similarity of sentence pairs.solve.In this paper,the existing extractive summarization and generative summarization methods are studied,and a hierarchical structured generative text summarization is proposed.Using the encoder-decoder framework,the words are hierarchically encoded at the encoding end,and then the resulting sentence vector is expressed Input to BiLSTM for selection,and use the newly generated sentence-level vector as an intermediate semantic state to pass to the decoding end.The decoding end uses multiple layers of LSTM combined with Attention to decode.Multi-layer recurrent neural network improves the accuracy of generating summaries to a certain extent,and improves the generalization ability of the model.

Keywords/Search Tags:

PDF Full Text Request

Related items

1	Calculating Phenotypic Similarity Between Genes Using Hierarchical Structure Data Based On Semantic Similarity
2	Research On Semantic Similarity Matching Algorithm Of Questions Based On Deep Learning
3	Research On Multi-modal Similarity Learning
4	Semantic Similarity Measurement Of Short Text By Convolutional Neural Network Based On Multi-Dimensional Attention On Word Vector
5	Aspect-based Sentiment Analysis Method Based On Hierarchical Attention Networks
6	The Research On Remaining Processing Time Prediction Of Business Process Based On Multi-Head Attention LSTM Adversarial Network
7	Research On Automatic Document Summary Based On Generative
8	Research On Hybrid Video Recommendation Algorithm Based On Multi-Head Self Attention Mechanism
9	Image Semantic Segmentation Based On Generative Adversarial Networks And Self-Attention Mechanism
10	Text Semantic Similarity Algorithm Based On Transformer