Text classification,with the goal is to asign labels to text,is one of the classic tasks in natural language processing(NLP).And Text classification has a wide range of ap-plications,such as spam detection,sentiment classification and topic labeling and so on.However,a better text representation is the key to get a better performance for NLP tasks such as text classification.The traditional methods of the text representation are the bag of word(BOW)model or vector space model(VSM),However,the methods not only lost the context information of a text,but also is sparse and faced with the curse of dimen-sionality.For text representation,with the performance improvement of hardware and the increase in the amount of data,deep learning methods are more and more popular,such as convolutional neural networks(CNN),recurrent neural networks(RNN)and attention mechanism to learn text representations for classification,and got a better performance.In this thesis,we propose two sentence level text representation and classification methods based on deep neural networks,as follows:First,bidirectional recurrent neural networks and convolutional neural networks for text representation and classification(BRCNN).In BRCNN,the inputs are the word vec-tors;then we apply a recurrent structure to get more contextual information of a sentence;and then is a convolution structure which can effectively extract the feature of a sentence,then is max-pooling operation to get the sentence vector;finally we use the softmax func-tion for classification.The RNN can capture the words order information in a sentence,and the CNN can extract the useful features effectively of a sentence.We test our model on 8 benchmarks text classification datasets,and get a better performance.Second,attention mechanism and convolutional neural networks for text represen-tation and classification(ACNN).In ACNN,the attention mechanism can provide the context vector for the followed convolution layer;and the convolution layer can extract the feature of the context vector and the max-pooling operation to convert the text into a low dimension feature vector to get the sentence vector;finally,the Softmax classi-fier is used for text classification.We test our model on 8 benchmarks text classification datasets,and get a better performance.In addition,this thesis propose a bi-attention mechanism.The bi-attention mecha-nism using the attention mechanism for both the forward and the backward RNN to get the forward context vector and backword context vector respectively,and then concatenates them to obtain the context vector. |