Font Size: a A A

Research On Text Classification Method Based On Feature Embedding Representation

Posted on:2021-01-17Degree:MasterType:Thesis
Country:ChinaCandidate:T S WangFull Text:PDF
GTID:2428330602464606Subject:Engineering
Abstract/Summary:PDF Full Text Request
The development of computer technology accelerates the process of the information age,at the same time causes the exponential growth of data and the increasing of data processing workload.To process text data more efficiently,natural language processing(NLP)and its related researches have been paid much attention.Text classification,as a sub-task of NLP,is widely used in many fields,such as news categorization,digital library,sentiment analysis,and spam filtering.According to the current researches,the text classification methods based on deep neural networks are better than the text classification methods based on traditional machine learning on the premise that the classifier can be fully trained.Therefore,the structure and application of deep neural networks will be an important way to improve the performance of text classification and the main research direction in the field of text classification.The effect of text classification not only depends on the setting of classifiers but also on how to construct text features.For the discrete text data,constructing a specific and interpretable language model to obtain the embedding representation of text and improving the feature embedding representation method to improve the quality of text quantization are effective methods to improve the performance of classifiers indirectly.In the field of text classification,the existing text classification methods have achieved excellent performance by combining text quantization methods and text classifiers,so it is an effective way to improve the performance of text classification by improving the feature embedding representation methods and combining it with deep neural networks.Through the analysis of the application and process of text classification,the significance of text classification methods based on feature embedding representation is expounded,the specific research contents are as follows:(1)A novel multi-label text classification method that combines dynamic semantic representation model and deep neural network(DSRM-DNN)is proposed.DSRM-DNN utilizes word embedding model and clustering algorithm to select semantic words.Then the selected words are designated as the elements of DSRM-DNN and quantified by the weighted combination of word attributes.Finally,we construct a text classifier by combining deep belief network and backpropagation neural network and the low-frequency words and new words are re-expressed by the existing semantic words under sparse constraint.The performance of DSRM-DNN on RCV1-v2,Reuters-21578,EUR-Lex,and Bookmarks shows that DSRM-DNN outperforms the comparative methods.(2)A text classification framework combining character-level convolutional and generative adversarial networks(CCNN-GAN)is proposed.Texts are quantified by the character-level convolutional neural network(Char-level CNN),and then the textual features are input into the adversarial network and the classifier respectively.In the data augmentation module,the processed real-data are utilized to train the generator and the discriminator to make the generative distribution constantly fits the real-data distribution.The classifier is cooperatively incrementally trained by the real-data and the generated data.In this way,the problem of small samples can be solved and the consumption of text generation can be reduced.With extensive experimental validation on four public datasets,our method significantly performs better than the comparative methods.
Keywords/Search Tags:text classification, deep belief network, sparse representation, character-level convolutional neural network, deep learning
PDF Full Text Request
Related items