Many tasks in natural language processing can be transformed into computing the distance between two texts,such as information retrieval and question answering system.From the perspective of cognitive linguistics,language learning can be divided into stages.When people learn a language,there is a difference from easy to difficult in different learning stages,and different learning difficulties can also be abstracted into the concept of distance.When the content of one stage is finished,it needs to be extended to the next stage based on the current learning difficulty.At present,the editing of textbooks in different stages is based on experience and artificial construction.If we can use the related technology in natural language processing to automate this process,it will be helpful to improve the efficiency of language learning,and from the computer point of view to provide evidence for the relevant research in cognitive linguistics at the same time.This thesis will explore the method of calculating the distance between sentences by using the related techniques in depth learning.In recent years,at the context of distributed semantics,natural language processing has been rapidly developed,and the most important achievement is word vector.However,the current research on the word vector is carried out in the context of pure text corpus,which is very different from human cognitive process.People can learn multi-angle knowledge,especially the visual information,depending on abundant sensory information.Therefore,this thesis studies the representation of words from the multimodal point of view.At present,the mainstream methods generate multi-modal word vectors by directly combining word vectors with image features,which is slightly rough.In general,there are many different things in the image,but the target object of the word may only exist in the local position of the image.Therefore,a multi-modal word vector construction method based on spatial attention mechanism is proposed to enhance the local representation of the target object.The results show that multi-modal word vectors can better model semantic similarity.On the related tasks,the spearman correlation of the multimodal word vector is 0.819,which is obviously improved compared with the common word vector.This thesis further studies the construction method of sentence embedding based on the multi-modal word vectors,and proposes three improved methods,including the method based on the neural word-of-bag model,the method based on bidirectional RNN and the method based on gated CNN.In order to make the distance calculation model suitable for more tasks,this thesis further adds the distance calculation module,which is that the model can adjust the vector space of sentence embedding according to the specific tasks.This thesis tests the performance of the model on the task of paraphrase identification,answer selection and sentence difficulty rating.The experimental results show that the proposed distance calculation method can adjust the concept of distance according to specific tasks.In the task of paraphrase identification and answer selection,the accuracy and MAP reached 85.4% and 71.6% respectively,which achieved the performance of mainstream model.On the task of sentence difficulty rating,spearman correlation is 0.692,which indicates that the method can model the abstract concept of "difficulty distance" to some extent. |