Cohesion and coherence are two of the most basic features of sentence composition.Generally,cohesion is the means of lexical and grammatical linking,and coherence is the result of using these linking means.Cohesion refers to the adhesiveness on the surface structure of text,which is a visible network of text,while coherence refers to the underlying semantic relevance of the text,which is a invisible network of text.According to the funcational linguist Halliday,a coherent text is linked by semantically related similar components.If a text lacks such similar components,there will be gaps in the textual cohesion,thus leading to incoherence.Coherence modeling is a fundamental issue in natural language processing,which can be widely used in many applications of natural language processing,such as sentiment analysis,statistical machine translation,text generation and text summary.It aims to establish computable models of coherence degree between sentences in text.The early work on textual coherence modeling mainly applied feature engineering methods to extract various semantic features from the text,such as entity information,syntactic path information,etc.,and then selected features and trained the classifier.With the great success of deeping learning in fields of speech and images,some researchers began to construct the coherence model of English texts using neural network.However,the current deep-learning textual coherence model does not effectively integrate with the early entity information based model and highlight the important role of entities in the textual cohesiveness.Moreover,the existing models are mainly based on English texts,there are few researches on coherence models of Chinese texts.The main research work of this thesis consists of the following two aspects.First,this paper proposes a textual coherence model based on time recursive neural network with the entities in the texts distributedly presented.Then the entity information between sentences in the texts are effectively integrated.The results of the ranking task of English and Chineses sentences and the evaluation on machine translation coherence show the effectiveness of the approach.Second,this paper proposes a coherence model based on bidirectional LSTM(Long Short Term Memory).The entity information of adjacent sentences in the text is extracted and distributedly presented,and integrated into the sentence-level bidirectional LSTM deep learning model by multiple simple and effective vector operations.The effectiveness of the method is also demonstrated by the experimental results of the ranking task of English and Chineses sentences and the evaluation on machine translation coherence.In summary,based on deep learning technology,this paper explores the importance of textual entity cohesion in modelling coherence,proposes some solutions to the related problems,and finally experimentally verifies the validity of the methods.This work is of certain significance to the future textual coherence modelling of Chinese texts. |