| Natural language processing research is the core problem in the field of artificial intelligence.The essential to solving this problem is how to make the machine correctly parse the semantics of natural language and obtain some form of semantic representation.Since the sentences in natural language are the main units that carry semantic information,accurate interpretation of sentence semantics is the key to realizing the task of natural language semantic understanding.Given that distributed word embedding has been successfully applied in natural language processing tasks such as machine translation and automatic summarization,it is natural to think of extending distributed representations to long texts such as sentences,paragraphs,or texts,that is,sentences,The semantic representation of a paragraph or chapter is mapped to a low-dimensional contiguous space.Sentences,which are formed by the coupling of words according to syntactic structure,are important linguistic units that constitute paragraphs and chapters.The existing sentence semantic embedding representation method is mainly based on the word embedding representation in the sentence to perform weighting or summation averaging,ignoring the word order and syntactic structure information,so the learned sentence embedding representation is not accurate.This paper mainly proposes two methods for the current sentence embedding expression learning because of the lack of syntactic structure information and the long distance dependence caused by the long sentence length,which leads to the inaccuracy of sentence embedding.The first method is a sentence-embedded representation learning method based on syntactically structured features,in order to reduce the time spent on parameter training and the influence of non-important information in sentences on sentence semantics.When constructing a syntactic structure tree,perform corresponding pruning operations on some complex syntax trees and convert syntactic information into weight calculations.Another method proposes the weight fusion of the word vector with the syntactic information(part of speech,phrase,clause),and highlights the semantic information of the words in different sentence structures.And the fused vector is encoded by the siamese LSTM network,which effectively solves the problem of long-distance dependence caused by the excessive length of the sentence.Compared with the existing supervised and unsupervised learning algorithms,the proposed method improves the accuracy of sentence similarity calculation. |