Font Size: a A A

Research On Text Modeling Based On Convolutional Neural Network Approaches

Posted on:2018-04-26Degree:MasterType:Thesis
Country:ChinaCandidate:K XingFull Text:PDF
GTID:2348330518983392Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In machine learning,data representation is the key to determining the performance of subsequent tasks.Text is a large class of data,text representation is the basis work of many natural language processing tasks,the establishment of the text representation model is designed to analyze and express the semantic information of text,to achieve better results in text classification,machine translation,question and answering and other natural language processing task.In the traditional text representation methods,such as the bag-of-words model,with the issues of the data sparse and dimension disaster and other issues,the model of the generalization ability is poor.In recent years,with the development of machine learning,a variety of text representation model based on the neural network began to appear.The text representation model based on neural network is a new dimension structure of the text,through the neural network learning mapping to obtain low-dimensional continuous vector,which all vectors in the same low-dimensional vector space,to improve the model ability to express.At the same time,the convolution neural network has better feature selection ability in various neural networks.However,there are some problems in the existing neural network text representation model.First of all,for the same word in different texts,we use the same single vector in the neural network.In the feature extraction,it is necessary to make a better distinction between the word polysemy and homonymy.Then,for the normal neural network model text representation model,semantic and structural information cannot effectively capture different from combinations of sequences of variable length text unit,and the text of the document representation model performance will be greatly reduced.According to the above problems,this paper compares the text representation methods of neural networks from the two levels of sentences and documents respectively.According to the shortcomings of existing text representation methods,an improved representation model is proposed from the two level of sentence and document representation method.The following is the main work of this paper:Firstly,we propose a model of sentence text representation based on topic word embedding.In this model,for the word embedding matrix of neural network input layer,the same word in different text semantic information should be different characteristics.For each word in the sentence text is assigned to the text corresponding to the topic information,to get the topic word embedding for each word.At the same time in order not to have the topic of irrelevant information is introduced to the neural network in the middle layer,this chapter join the topic information transfer matrix,the transfer matrix is calculated according to similarity and probability distribution of the topic and word.The topic word embedding is integrated into the neural network model by the topic transfer matrix,which eliminates the ambiguity of the word in different texts.The experimental results show that the proposed text representation has better performance in the sentence level emotion classification task.Secondly,we propose a document representation model based on convolutional neural network.In view of the fact that the normal neural network model text cannot capture the semantic relation of long distance in document text.The topic word embedding sequence corresponding to the word of the whole document is processed by the long and short memory network,the hidden state sequence contains semantic and structural information of long distance,and finally the text feature is extracted by convolution neural network,and the text representation is obtained.In this paper,two models are presented,which is based on whether to consider the semantic interaction between sentences in the document,to give the document semantic memory text representation model and sentence-document semantic memory text representation model.The experimental results show that the proposed text representation performs better at the document level sentiment classification task.
Keywords/Search Tags:Text Representation, Convolutional Neural Network, Word Embedding, Topic Transfer Matrix, Long and Short Time Memory Network
PDF Full Text Request
Related items