Font Size: a A A

Short Text Classification Based On Multi-granularity Feature Representation And Recurrent Convolutional Neural Network

Posted on:2020-07-23Degree:MasterType:Thesis
Country:ChinaCandidate:T T HongFull Text:PDF
GTID:2428330575463064Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,the number of electronic texts has increased dramatically,and short text data accounts for a large proportion,such as questions raised by users in the question and answer system,and product reviews.Faced with such a large short text data set,it is extremely challenging to maintain,manage and apply.Text classification is the basic task of natural language processing,mainly focusing on two aspects:feature engineering and classification algorithm.Feature engineering is the basis of text classification.Most of the traditional feature engineering is based on the bag of words model which usually ignores the semantic information of the text or the word order features in the text,and there is a problem of"dimensional disaster ".Nowadays,short text data on the Internet is exponentially increasing.Text data is not only complex and diverse in type.The use of machine learning classification algorithms for text classification is not only time-consuming and labor-intensive,but also weak in generalization,especially for unbalanced data sets.Therefore,the study of feature representation and classification models is crucial.In recent years,the rapid development of deep neural networks has brought new hope to text classification.In the classification performance,the deep neural network is better than the traditional machine learning algorithm,and has good generalization ability.This thesis studies short text classification on multi-granularity feature representation and deep learning classification models.The main research work of this thesis is as follows:1.The BLSTM_MLPCNN classification model is proposed in this thesis.The model consists of three parts:bidirectional long-term memory neural network,multi-layer perceptual convolutional neural network and fully connected layer.Firstly,the long-term memory neural network is used to capture the context information of the current word of the input layer,and a deeper text feature representation is constructed.The multi-layer perceptual convolutional neural network is then used for local feature extraction and down-sampling of key features.Finally,the classification is achieved by the fully connected layer and the softmax function.Under the five standard English data sets,the experimental results show that the multi-layer perceptual convolutional neural network can extract features better and improve the classification accuracy.And the dual-input BLSTM_MLPCNN model based on Glove word vector and character-level vector achieves a good classification effect on short text classification tasks.2.A method of multi-granularity feature representation is proposed in this thesis.On the one hand,the word2vec tool is used to train the word embedding representation of each word under the Wikipedia dataset.Each word has certain semantic information,and the semantic relationship between words and words can be measured by the similarity method;On the other hand,through the BTM topic model,the topic-word distribution is learned,that is,the probability values of each word under each topic are obtained,and then the topic-word distribution is processed by Bayesian principle to obtain the word-topic vector,which represents the probability value of each topic under each word.Both the fine-grained semantics of the words and the abstract description of the text are taken into account.Under the classical machine learning classification algorithm and deep learning model,The experimental results show that combining word granularity and topical level to abstract the text together can effectively improve the text classification effect.In particular,the BLSTM_MLPCNN model based on the dual input layer of word2vec word vector and word-topic vector achieves better classification effect.And for different data sets,the dimension of the word-topic vector will have a certain impact on the experimental results.
Keywords/Search Tags:short text classification, multi-granularity feature representation, word-topic vector, multi-layer perceptron convolutional neural network
PDF Full Text Request
Related items