Research On Short Text Classification Based On Feature Representation And Dense Gated Recurrent Convolutional Network

Posted on:2021-05-01

Degree:Master

Type:Thesis

Country:China

Candidate:M Y Xue

Full Text:PDF

GTID:2428330620965859

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of social networks and online shopping platforms,electronic texts are widespread,and their high proportion is short text data,such as news headlines,Twitter texts,and online shopping reviews.The commercial value and practical application of accurate classification for short texts have aroused widespread concern in academia.At present,short text classification has been widely used in many fields such as personalized recommendation,sentiment analysis,public opinion tracking,and question and answer selection.Recently,the rapid rise of deep learning technology has promoted the research and development of text classification task.According to the classifier performance,deep neural networks are superior to machine learning algorithms,and adding network depth can extract higher-order features of text,which helps improve classification performance,but gradient disappearance and gradient explosion problems are prone to occur during training.In addition,due to the sparseness and ambiguity of short text,it will affect the performance of classification.In view of the above situations,this article has carried out the following research from the two aspects of classification model and text feature representation:(1)For the advantages and disadvantages of text sequence modeling for recurrent neural networks and convolutional neural networks,and deeper networks are prone to gradient disappearance or gradient explosion defects,a hybrid model based on densely connected gated recurrent unit convolutional networks(DC-BiGRU_CNN)is proposed in this thesis.Firstly,a standard convolutional neural network is used to train the character-level word vector,and then the character-level word vector is spliced with the word-level word vector to form the network input layer.Inspired by the densely connected convolutional network,a densely connected bidirectional gated recurrent unit proposed in this thesis is used in the stage of high-level semantic modeling of text,which can alleviate the defect of gradient disappearance or gradient explosion and enhance the transfer between features of each layer,thus achieving feature reuse.Next,the convolution and pooling operation are conducted for the deep high-level semantic representation to obtain the final semantic feature representation,which is then input to the softmax layer to complete text classification task.The experimental results on several public datasets show that DC-BiGRU_CNN has a significant performance improvement in terms of the accuracy for text classification tasks.In addition,this thesis analyzed the effect of different components of the model on performance improvement,and studied the effect of parameters such as the maximum length of sentence,the number of layers of the network,and the size of the convolution kernel on the model.(2)Aiming at the ambiguity and sparsity of short texts,this thesis solves the problem based on multi-granularity,a method of short text classification based on combining BERT embedding with BTM topic vector is proposed in this thesis.Firstly,the dynamic word vectors generated by the Bert pre-trained language model,and topic word vector is constructed BTM.Next,we combine Bert embedding with BTM topic word vector,so that the text has two granular levels of abstract semantic representation,word level and topic level.Then based on the DC-BiGRU_CNN deep learning model for semantic modeling and classification,the experimental results show that the fusion feature representation can enrich the semantic information of the text and can effectively improve the text classification effect.In addition,the effects of maximum pooling and average pooling on performance improvement were analyzed through experiments,and the influence of the number of topics on the model effect was studied in this thesis.

Keywords/Search Tags:

short text classification, densely connected, bidirectional gated recurrent unit, Bert and BTM, multi-granularity feature representation

PDF Full Text Request

Related items

1	Short Text Classification Based On Multi-granularity Feature Representation And Recurrent Convolutional Neural Network
2	Text Sentiment Analysis Based On Bert-BiGRU-CNN
3	Research On Sentiment Analysis Of Short Text Based On BERT And Composite Network
4	Sentiment Classification For Review Texts Via Bi-directional Gated Recurrent Unit
5	Research On Short Text Classification Method Based On Contextual Feature Expression
6	Application Of Improved Deep Learning Algorithm In Chinese Text Classification
7	Research On Deep Learning Text Classification Method Based On BERT Model
8	Research On Short Text Classification Based On Multi-Granularity Topics
9	Research And Implementation Of News Recommendation System Based On Improved BERT And Gated Recurrent Unit Network
10	Recognizing PICO Elements In Medical Text Based On Deep Learning