Font Size: a A A

Research On Short Text Representation And Classification Based On Convolutional Neural Network

Posted on:2019-03-21Degree:MasterType:Thesis
Country:ChinaCandidate:R WangFull Text:PDF
GTID:2438330545493150Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,the amount of text data generated has also increased dramatically.In particular,short texts generated by social networking and shopping terminals.These data are numerous and complex in variety.Massive short text data provides a large amount of data for natural language processing research,and it provides challenges for text classification and other research.This paper is devoted to the task of text classification,one of the basic tasks in natural language processing,to improving the accuracy of short text classification.Text classification tasks are mainly divided into long text classification and short text classification.When dealing with short text classification tasks,due to the characteristics of short text,high sparseness,and poor semantics,traditional statistical linguistics techniques has encountered many difficulties.For example,due to the strong traditional openness of short text,traditional word coding cannot efficiently perform high-quality coding.Because of the poor semantics of short texts,they often do not perform well when dealing with short texts using traditional document representation models.For the difficulties of short text classification,this paper mainly studies from the following three aspects:(1)Researched the word vector coding method,and used the density peak clustering algorithm and affinity propagation clustering algorithm to cluster word vectors.Firstly,the artificial neural network was used to encode the words through the word embedding method,and the coding method of the text words was improved;then the trained word vectors were processed by clustering,and the cluster processing aims to reduce the influence of noise points on the classification model.Finally,the dictionary library obtained by clustering had a better coding norm.(2)Researched the convolution neural network classification model and proposed a new double-channel convolution neural network classification model.Firstly,we used the improved double-channel model to obtain more semantic features and enriched the semantic environment.Then we used the continuous convolution layer and pooling layer to reduce the feature size,so that the features were compressed in the low-dimensional feature map.Finally,the improved model improved the accuracy of text classification effectively.(3)Researched the application of attention mechanism in convolution neural network model,a multi-level convolution neural network classification model based on attention mechanism was proposed.In the convolution neural network structure,the word attention layer and the sentence attention layer were respectively added.The word attention layer mainly calculates the weights of the initial sentence semantic information.The sentence attention layer mainly calculates the weights of the features abstracted by the convolution layer.The multi-level attention mechanism strengthens the feature extraction ability of the model and improves the accuracy of the convolution neural network classification model.In this paper,we set up several sets of contrast experiments from different perspectives,and experimentally verify the proposed word vector clustering,double-channel convolution neural network,and the convolution neural network model based on attention mechanism.The experiments showed that the proposed idea in this paper was accurate and highly efficient,and had a higher improvement compared to other classification models in the same field.This proved that the work done in this paper was meaningful and made a certain contribution to the study of short text classification.
Keywords/Search Tags:Short-text Classification, Short-text Representation, Convolution Neural Network
PDF Full Text Request
Related items