Research On Multi-scale Text Classification Algorithm Based On Deep Learning

Posted on:2022-05-29

Degree:Master

Type:Thesis

Country:China

Candidate:Z Q Tao

Full Text:PDF

GTID:2518306545955459

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the blooming of the big data era,the text data has increasing on social media,especially in post bars and microblogs.Although informatization enriched people's lives,it brought the difficulties of supervision.Social media consists negative news which may cause negative impact without controlling.However,it is difficult to processing the amount of text data.Therefore,it is an important research to control text information by computer algorithms in the current computer field.According to the sample length in datasets,it can be divided to short text datasets and long text datasets,which have different algorithms to processing.With the continuous development of machine learning and deep learning,text classification technologies has also made great progress.However,it still exists many problems.In order to solve the problems of long text on the model,this paper proposes a hierarchical self-attention hybrid sparse networks for document classification.Firstly,the method divides the long text into different sentences to obtain the corresponding sentence vector.Then,it obtains the text representation of the text through the sentence vector.In this way,it can avoid the problem of feature extraction caused by long text datasets.Moreover,in order to solve the problem that the model cannot capture important features,self-attention mechanism is used to capture important features,which can assign more weight to the important feature.Finally,it can reduce parameters and the use of computing time by pruning the structure of the RNN gating units.Extensive experiments demonstrate that our model obtains competitive performance and outperform previous models.In order to solve algorithm problems for the disappearance of gradients,the deficiency of text feature and the mismatch of extracting phrase features in attention mechanism during the training of neural network in short text classification,a new method base on dense-pool connection and phrase attention mechanism is proposed.Firstly,the method is used to extracting features while alleviating the gradient disappearance problem through the residual network and reuse important features through dense pooling connection.Then,the phrase attention mechanism is used to solve the problem of phrase dimension mismatch in the traditional attention mechanism.Two models proposed in this paper can effectively process texts of different dimensions and solve the problem of feature extraction and missing feature in texts of different dimensions.Corresponding processing methods can be selected according to the length of the text,which improves text classification accuracy effectively.Finally,it can be concluded that our models in this paper can effectively classify general datasets with excellent results through experiments,which proves the effectiveness of the models.

Keywords/Search Tags:

text classification, deep learning, convolutional Neural Networks, self-attention, recurrent neural network

PDF Full Text Request

Related items

1	Research On Classification Of News Text Based On Deep Learning
2	Text Classification Based On Deep Learning
3	Research On Text Classification Model Based On BGRU And Self-Attention Mechanism
4	Research On Chinese Text Classification Based On Deep Learning Theory
5	Text Representation And Classification Based On Deep Learning With Improved Attention Mechanism
6	Application Research On Automatic Classification Of Massive Academic Resources
7	Research On Text Classification Model Based On Deep Neural Network
8	Text Classification Research Based On Deep Neural Network And Attention Mechanism
9	Research On Text Classification Method Based On Deep Learning
10	Research On Short Text Sentiment Classification Model Based On Deep Learning