Font Size: a A A

Research On News Classification Algorithm Based On Hierarchical Neural Network

Posted on:2020-12-08Degree:MasterType:Thesis
Country:ChinaCandidate:C Y LiFull Text:PDF
GTID:2428330602954327Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the development of information technology,information data is growing rapidly.How to quickly and efficiently obtain the required information from massive information resources has got high attention.Text is an important carrier of information.The extraction and representation of text content is the key means to solve the problem of text information management.News data is one of the most frequent data of daily data.The accurate classification of news helps to shorten the viewing time and enhance the reading experience.Text representation is the basis for text categorization.Traditional text representations use a representation of the count.This representation method is based on the fact that words and words are independent of each other,ignoring the semantic information of the text;and the selection of features requires the construction of feature engineering,and the extracted features have high latitude and highly sparse,and cannot effectively represent text information.Recently,deep learning has been proposed and promoted.Through the unique hierarchical network hierarchy,layer-by-layer extraction and fusion from low-level features to high-level features can be effectively realized,which provides powerful support for the extraction of text representation and the establishment of accurate classification models.This paper studies the classification of news texts based on the hierarchical neural network model.The main work is as follows:This paper designs the news gathering crawler according to the requirements,and preprocesses the collected news data.In the process of Chinese word segmentation,this paper uses the tool of Python-based Jieba word segmentation.In order to achieve better word segmentation effect,the user-defined dictionary of Jieba word segmentation is used to extend unregistered words.By using distribution representation which convert text data into low-dimensional dense vectors to solve high-dimensional spare problem of traditional text representation methods in natural language,which avoids the dimensional disaster caused by high-order input.Through the principle analysis and experimental comparison of the language model training word vector,the language model of word vector training and the dimension of word vector are determined.Based on the hierarchical network classification model:By analyzing the difference between the length of the text and the amount of information in the news headline and the text,two classification text models based on hierarchical network news are proposed.(1)the title is extracted as a common sentence together with the text,and the separation of the title and the content representation is implemented after the sentence representation is completed,thereby avoiding the loss of the title data in the subsequent data processing.(2)Introduces a convolutional neural network to achieve targeted feature extraction of the title and body,and performs feature fusion after the completion of the respective feature extraction.Attention mechanism is introduced in each level feature extraction process to solve the weight ratio problem.Finally,another purpose of combining the title and text data is to reduce the amount of model calculation,using the headline information as the main body,the text data as the feature expansion,reasonably reducing the amount of text data processing,then improving the efficiency of news text classification.
Keywords/Search Tags:Text Representation, News Classification, Hierarchical Neural Networks, Attention Mechanism
PDF Full Text Request
Related items