Font Size: a A A

Research On Chinese News Classification Algorithm Based On Deep Learning

Posted on:2021-12-13Degree:MasterType:Thesis
Country:ChinaCandidate:M M DouFull Text:PDF
GTID:2518306521489244Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of big data era,in the face of the rapidly increasing number and complexity of text data,there is an urgent need to find more effective ways to classify and manage these resources.Text classification can effectively process text information and improve the utilization of information.News is the most effective way for people to get news and understand current affairs.Its content is mainly composed of unstructured text data.It is of great practical significance to study news classification,which is helpful to the development of news personalized recommendation,advertising push and other fields.This paper mainly uses deep learning technology to study news classification,the main work content is as follows.Firstly,it introduces the research background and significance of text classification,analyzes the research status of text classification at home and abroad,summarizes the existing problems at this stage,and then puts forward the corresponding improved algorithm from the perspective of news classification.Secondly,aiming at the problems of insufficient feature extraction,difficulty in processing sentence structure information and capturing long-distance dependence in traditional convolutional neural network for Chinese text classification,a hybrid neural network classification model based on TC-ABlstm(Text Convolutional Attention Bidirectional Long Short-Term Memory)is proposed.The model improves the traditional convolution neural network to enhance the ability to extract local features of text;and constructs a bidirectional long-term and long-term memory neural network model combined with attention mechanism to capture the global features of text context;finally,the advantages of the two models are combined to improve the accuracy of classification.Thirdly,aiming at the phenomenon of polysemy in the word vector trained by common pre-training models and the influence of word segmentation technology on Chinese text segmentation,we use the BERT model to represent the word vector.At the same time,considering that the text content of the news data is relatively long,and the BERT model is limited by the length of the text representation,in order to enhance the representativeness of the representative text,before using the BERT model for classification,the TextRank algorithm is used to extract key news sentence information.Finally,the two algorithms proposed in this paper are tested on two real data sets.The results show that the two models can effectively improve the accuracy of Chinese news classification.
Keywords/Search Tags:deep learning, text categorization, convolutional neural network, BiLSTM, BERT model
PDF Full Text Request
Related items