Font Size: a A A

Relation Recognition Of Non-saturated Chinese Compound Sentences With Two Clauses Based On Deep Learning

Posted on:2021-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:K L SunFull Text:PDF
GTID:2428330605961314Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Compound sentence relation recognition is screening for the semantic relation of clauses,which is the key to analyze semantic relationships of sentences.The research on the recognition of compound sentence relation categories is helpful to promote the development of machine translation,question-answering systems,automatic abstract generation and other fields,thus improving overall performance of these systems.It is difficult to identify the semantic relationship of the non-saturate marked compound sentences because the relational markers in them cannot explicitly indicate the relation categories of compound sentences.In this paper,we will study the classification of non-saturate marked compound sentences.There are two main methods for identifying compound sentence relations.One is based on the constraints formed by linguistic rules.This method is mainly based on the constraints summarized by linguists for a large number of corpus texts and the corresponding rule base established.The other is to use statistical methods to extract the lexical and literal features of sentences from large-scale corpus,thus constructing the feature engineering based on corpus text.However,these methods make the generalization performance of the obtained feature set poor and the recognition accuracy is not high.In addition,the engineering quantity of the feature set is large,which requires a lot of labor and time.In order to deeply mine the semantic information features contained in compound sentences,and to capture the semantic correlation information of clauses in sentences,this paper proposes to use the deep learning method to process the Chinese compound sentences,and introduces the word embedding model for the words sentences.The main contributions of this paper are as follows:First of all,this paper uses the text corpus extracted from "Changjiang Daily","People's Daily" and some contemporary novels to form a compound sentence corpus.This paper statistics and summarizes the relation categories of compound sentences and the corresponding relation markers.On the basis of these,this paper also creates a corpus of non-saturate compound sentences with two clause,and takes this corpus as the main data set of this research.Secondly,this paper proposes a network model combining CNN and Bi-LSTM based on word clustering.The model first uses word clustering algorithm to model word vectors in order to extract semantic similarity features between words.Then CNN is used to model the compound sentences in a deeper level to obtain the local features of sentences.In addition,the model has partially improved CNN,thus automatically identifying and classifying the relational categories of compound sentences.Then,this paper proposes a multi-channel convolution neural network model method based inner-attention mechanism.The inner-attention mechanism model is also based on Bi-LSTM.At the same time,in order to make full use of the text features,convolution neural network(CNN)is jointly used to model the sentence representation to obtain the local features.In order to obtain more sufficient and significant feature representation,and then predict the relation categories of compound sentences better.Finally,the method based on deep learning proposed in this paper is validated on the data set of non-saturate compound sentences with two clauses.The experimental results show that the performance of CNN based on word clustering and Bi-LSTM combined network model and multi-channel convolution neural network based on inner-attention mechanism are better than that of learning methods based on linguistic rules and statistics,while ensuring the extensibility of the model.In addition,because the multi-channel convolution neural network model based on the intra-sentence attention mechanism introduces the semantic information correlation features between clauses,and focuses on more important semantic information in sentences by using the attention mechanism,so it plays a good role in prompting the semantic feature learning in the model training stage and enhances the self-learning ability of the model.Therefore,the latter model has better effect than the former.
Keywords/Search Tags:Compound sentence relation classification, Non-saturate Chinese compound sentences, Deep learning, Relational marker, Semantic representation
PDF Full Text Request
Related items