Font Size: a A A

Reserch On Application Of News Text Classification Based On Deep Learning

Posted on:2022-06-16Degree:MasterType:Thesis
Country:ChinaCandidate:L C SunFull Text:PDF
GTID:2518306785476364Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information networks in recent years,the information age marked by big data and deep learning has arrived.The number of Internet users has increased dramatically,and data has also shown an explosive growth trend.In these data,80% of the data is in text form.They contain huge commercial value and scientific research value.How to mine valuable information from these text data is a hot research topic.As a result,text classification technology has received extensive attention from researchers.This paper mainly studies the problem of text classification,and proposes new ideas for the deficiencies of some algorithms to improve the effect of text classification.The main work is as follows:(1)According to the characteristics of text classification in the network,the text data usually appears unbalanced.Regarding the shortcomings of conventional methods for processing unbalanced text data,this paper starts from two levels of data and algorithm,and proposes a distance oversampling algorithm based on Bagging.It mainly uses oversampling to process a few categories in the data,and sets a distance threshold,and assigns different weights according to the distance between samples.For example,in a certain cluster,a distance threshold is set.Once samples exceed this threshold,more weights will be allocated,and less weights can be allocated below this threshold.At the algorithm level,the Bagging method is used to process most categories,and the two methods are combined to verify the effectiveness of the method through multiple experiments.(2)Aiming at the problems of weak feature expression,high text dimensionality,and sparse matrix when processing large-scale relatively balanced text in traditional machine learning,This paper proposes a text classification model fused with LSTM-A(LSTM+Attention),and introduces an attention mechanism on the basis of long and short memory networks.Mainly use the special gated loop unit in LSTM,and introduce the Attention mechanism on this basis to strengthen the feature transfer between each network layer.Generate different weights for different sentences to reduce the loss of key features.Finally,experiments on three text data sets verify the superiority of this method.
Keywords/Search Tags:unbalanced text, deep learning, text mining, text classification, attention mechanism
PDF Full Text Request
Related items