Font Size: a A A

Research On Long And Short Text Classification Algorithm Based On Deep Features

Posted on:2022-11-18Degree:MasterType:Thesis
Country:ChinaCandidate:L Q YanFull Text:PDF
GTID:2518306758466814Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Text classification is a fundamental problem in Natural Language Processing.Its core is to extract the key features that can reflect the characteristics of the text from the text,and set up a mapping relationship from the captured features to the categories.Based on the characteristics of text,text classification is mainly composed of two categories: short text classification and long text classification.Short text classification refers to the classification of texts with fewer words and default sentence structures,such as topics or comments.The features that need to be paid attention to are word features and sentence features with obvious emotional tendencies.Long text classification refers to the classification of texts with a large number of words,complete sentence structure and great context,such as articles or news.The features that need to be paid attention to are semantic relationship features and contextual relationship information features.In short text classification,although the existing solutions can extract word features,they cannot extract word features and sentence features at the same time.In long text classification,although the existing solutions can extract the semantic relationship features as much as possible,they cannot extract the contextual relationship information features.According to the above problems,the main research results of this paper are as follows:For the problem of short text classification,this paper focuses on word features and sentence features with obvious emotional tendencies,and proposes Text Desnet through four steps of word embedding model,convolution extraction of word and sentence features,construction of parallel feature extraction framework and attention mechanism.The multi-scale convolutional feature extraction module in this network solves the problem of feature extraction for words with obvious emotional tendencies.The densely connected convolutional feature extraction module in this network solves the problem of feature extraction for sentences with obvious emotional tendencies.The attention mechanism module in this network solves the problem of different contribution factors between word features and sentence features.Finally,we compare Text Desnet with multiple models(CNN,Text CNN,Fast Text,DPCNN,and Text Desnet-C)on three short text benchmark corpora(Game Multi Tweet,Sem Eval,and SSTweet).The experimental results show that the accuracy of Text Desnet for short text classification is improved by an average of 1.1%.For the problem of long text classification,this paper focuses on semantic relationship features and contextual relationship information features,and proposes Word Level GCN through four steps of word embedding model,construction of multiple text graphs,message propagation mechanism and attention mechanism.The construction of multiple text graphs in this network solves the problem of extracting semantic relationship features and contextual relationship information features.The attention mechanism module in this network solves the problem that the contribution factor of the semantic relationship features and contextual relationship information features is different.Each text builds a graph separately,each word is only connected to its nearest and previous p words,and the parameters are shared globally,which solves the problem that the text graph is too large and consumes too much memory.Finally,we compare Word Level GCN with multiple models(CNN,Text CNN,Fast Text,DPCNN,Text GCN,Tensor GCN,and Word Level GCN-G)on three long text benchmark corpora(AG News,R8,and Yahoo! Answers).The experimental results show that the accuracy of Word Level GCN for long text classification is improved by an average of 1.4%.
Keywords/Search Tags:Text Classification, Feature Extraction, Parallel DenseNet, GCN, Attention Mechanism
PDF Full Text Request
Related items