Font Size: a A A

Research On Multi-scale Text Classification Algorithm Based On Deep Learning

Posted on:2022-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:Z Q TaoFull Text:PDF
GTID:2518306545955459Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the blooming of the big data era,the text data has increasing on social media,especially in post bars and microblogs.Although informatization enriched people's lives,it brought the difficulties of supervision.Social media consists negative news which may cause negative impact without controlling.However,it is difficult to processing the amount of text data.Therefore,it is an important research to control text information by computer algorithms in the current computer field.According to the sample length in datasets,it can be divided to short text datasets and long text datasets,which have different algorithms to processing.With the continuous development of machine learning and deep learning,text classification technologies has also made great progress.However,it still exists many problems.In order to solve the problems of long text on the model,this paper proposes a hierarchical self-attention hybrid sparse networks for document classification.Firstly,the method divides the long text into different sentences to obtain the corresponding sentence vector.Then,it obtains the text representation of the text through the sentence vector.In this way,it can avoid the problem of feature extraction caused by long text datasets.Moreover,in order to solve the problem that the model cannot capture important features,self-attention mechanism is used to capture important features,which can assign more weight to the important feature.Finally,it can reduce parameters and the use of computing time by pruning the structure of the RNN gating units.Extensive experiments demonstrate that our model obtains competitive performance and outperform previous models.In order to solve algorithm problems for the disappearance of gradients,the deficiency of text feature and the mismatch of extracting phrase features in attention mechanism during the training of neural network in short text classification,a new method base on dense-pool connection and phrase attention mechanism is proposed.Firstly,the method is used to extracting features while alleviating the gradient disappearance problem through the residual network and reuse important features through dense pooling connection.Then,the phrase attention mechanism is used to solve the problem of phrase dimension mismatch in the traditional attention mechanism.Two models proposed in this paper can effectively process texts of different dimensions and solve the problem of feature extraction and missing feature in texts of different dimensions.Corresponding processing methods can be selected according to the length of the text,which improves text classification accuracy effectively.Finally,it can be concluded that our models in this paper can effectively classify general datasets with excellent results through experiments,which proves the effectiveness of the models.
Keywords/Search Tags:text classification, deep learning, convolutional Neural Networks, self-attention, recurrent neural network
PDF Full Text Request
Related items