Font Size: a A A

Research On News Text Classification Method Based On Deep Learning And Multi-feature Fusion

Posted on:2022-08-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2518306734454514Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of the Internet and the popularization of wireless smart devices,new media has become the mainstream platform for information interaction.The network news text data which carrying rich information is growing explosively.News text classification is the core of news information retrieval and mining.However,traditional manual labeling methods are far from being able to meet the current order of magnitude of news classification tasks.Natural language processing technology is the first choice to solve this problem.Deep learning,as a hot cutting-edge technology in the field of artificial intelligence,has powerful feature expression capabilities and large-scale data processing capabilities,and has received extensive attention and research in the field of text classification.At present,most of the text classification tasks based on deep learning use pre-trained word vectors as text representations,and use convolutional neural networks,recurrent neural networks,or a hybrid network of the two as training models,to obtain better classification accuracy.However,there are some shortcomings that need to be resolved.News text has distinct thematic characteristics,word vector text generated by Word2 Vec only considers local context information,and lacks global topic factors.Recurrent Neural Network can effectively extract text context information,but the length of news text generally long,recurrent neural network has limited coding ability for long documents,and cannot highlight the role of key hidden vectors.What’s more,news text has a natural hierarchical structure,and classification models need to pay attention to the spatial relationship between different sentences and texts,however,convolutional neural networks have limitations in processing feature spatial relationships.Based on the current research background and the characteristics of news text data,we conducts optimization method research from the following two aspects.The main work is as follows.(1)Aiming at the problem that news text has distinctive topic characteristics and the recurrent neural network cannot highlight the role of key hidden vectors when processing long sequences of text,Topic-enhanced Attention Convolutional Gated Recurrent Network(T-ACm GRU)is proposed.Through the fusion of CBOW word vector and LDA topic feature to increase the global topic factor of word vector,effectively improve the quality of text representation.In order to reduce the computational complexity of the model,we simplify the GRU memory unit,and combine the characteristics of the text affected by contextual semantics,and propose a Bm GRU module to establish the global dependency of local phrase features and speed up the calculation of the model.Introducing attention mechanism,weighted summation of the hidden state of Bm GRU through the attention score,highlighting the expression ability of key hidden vectors,reducing redundant features,and improving classification accuracy.Results of comparative experiments on three public data sets show that T-ACm GRU has achieved better classification results on the task of news text classification.Results of ablation experiments show that the modules of T-ACm GRU are complementary and news topic features help to enhance the distinction between text representations and improve classification accuracy.(2)News text has a natural hierarchical structure,classification models need to pay attention to the spatial position relationship of different sentences and texts,some models which consider hierarchical structure have insufficient feature extraction.In this regard,Hierarchical Neural Network Based On Capsule Structure(HCaps Net)is proposed.HCaps Net discards the method of directly modeling an entire news text,and uses the natural hierarchical structure based on the news text to divide the text into three levels: words,sentences,and chapters.Then,we design corresponding feature extraction for different levels of text.Word vector representation is obtained by pre-training the CBOW network;the parallel dynamic LSTM network is constructed to encode the global semantic features of each sentence,and the parallel structure is used to speed up the model calculation speed.Capsule structure is introduced to construct the sentence feature into a vector structure capsule,and increase the model’s ability to store text feature information;Dynamic routing algorithms is introduced to realize the aggregation of SCap(sentence capsules)to CCap(chapter capsules),establish the spatial position relationship between sentences and chapters in news texts,The connection strength between SCap and CCap is dynamically adjusted by iteratively calculating the similarity between SCap and CCap,which improves the contribution of key sentences to the semantics of chapter and improves the classification performance of the model.Results of comparative experiments on three public news data sets show that the HCapsule model has significant advantages in news classification tasks.Compared with multiple models based on capsule networks,HCapsule is more computationally efficient.Finally,we takes the topic-enhanced text representation Vec LDA as the input of HCapsule and gives the T-HCapsule model.The classification evaluation index Micro F1 has been improved,indicating that multi-feature fusion is beneficial to improve the classification effect.
Keywords/Search Tags:Deep Learning, News Text Classification, Multi-Feature Fusion, Capsule Network, Topic Enhancement
PDF Full Text Request
Related items