Font Size: a A A

Research On Text Classification Based On Deep Neural Network

Posted on:2020-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:W ZhangFull Text:PDF
GTID:2518305954499394Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Text categorization is one of the classical tasks in the field of natural language processing.The goal is to identify the category of text.Text categorization is widely used in email detection,sentiment analysis,and topic marking.However,a good text representation method is the key to improve the performance of natural language processing tasks such as text categorization.Traditional text representation uses word bag model or word embedding model,which makes text lose a lot of semantic information.In recent years,with the explosive growth of text data and the tremendous improvement of computer performance.The deep learning technology has attracted great attention in text representation and classification.Convolutional neural network,cyclic neural network and attention mechanism are used to represent and classify text.The effect is better than that of traditional machine learning.However,the use of words in network text is more arbitrary.It is common for netizens to create words by themselves and become popular.When Chinese text is segmented,word segmentation dictionaries cannot recognize and segment these networks.New words make the result of text representation inaccurate,which restricts the performance of text categorization model to a certain extent.This paper presents a new word recognition technology and three text representation and classification models based on deep neural networks.Specifically as follows:1.Neural Network Model of Zero Filling Depth Based on New Word Recognition,NW-ZPDNN.Aiming at the problem that word segmentation tools can not accurately recognize network neologisms,this paper proposes a new word recognition technology to process the segmentation results of word segmentation tools in order to get more accurate segmentation results.At the same time,drawing on the advantages of in-depth learning in text representation,we design a NW-ZPDNN model based on in-depth learning.We use zero padding technology to transform indefinite text into fixed-length text,use bidirectional cyclic neural network to extract high-level context text semantic information,use convolutional neural network to extract more abstract semantic information,and reduce the computational load.Then we use the maximum pooling operation to get the key information of the text.Finally,we use the soft Max classifier to classify the text.Experiments show that NW-ZPDNN model achieves high accuracy in text categorization.2.Sliding Cycle Neural Network Model Based on New Word Recognition,NW-SLDNN.In view of the inadequate ability of word segmentation tools to recognize new words on the Internet,the new word recognition technology proposed in this paper is still used to obtain more accurate segmentation results.At the same time,a sliding cyclic neural network is proposed,which focuses on the local context information of the text.The 1x1 convolutional neural network is used to add non-linear factors to increase the model expression ability,realize cross-channel communication,extract higher-level text features,and reduce the computational load.Then we use the maximum pooling operation to get the key information of the text.Finally,we use the soft Max classifier to classify the text.Experiments show that NW-SLDNN model achieves high accuracy in text categorization.3.Attention mechanism neural network model based on Neologism recognition,NW-AttenDNN.In view of the inadequate ability of word segmentation tools to recognize new words on the Internet,the new word recognition technology proposed in this paper is still used to obtain more accurate segmentation results.In text information extraction,dynamic cyclic neural network and attention mechanism are used to encode variable length text and extract high-level semantic information.Then,the encoding is decoded to get a fixed-length feature sequence.After the conversion of full connection layer,the soft Max classifier is used to classify the text.Experiments show that the NW-AttenDNN model has achieved in dealing with text classification problems.Higher accuracy,due to the addition of attention mechanism,makes the key information of the text retained more complete and the model has interpretability.
Keywords/Search Tags:Natural Language Processing, Deep Learning, New Word Recognition, Text Representation, Text Classification
PDF Full Text Request
Related items