Font Size: a A A

Research On Security News Popularity Prediction Based On Deep Learning

Posted on:2020-11-27Degree:MasterType:Thesis
Country:ChinaCandidate:J L KongFull Text:PDF
GTID:2428330602950247Subject:Engineering
Abstract/Summary:PDF Full Text Request
The target of news popularity prediction is to predict the amount of hits,comments,or forwarding of news in the future.By predicting the popularity,news quality assessment,news ranking,news recommendation,and news retrieval can be performed.The prediction of news popularity can also alleviate the information explosion and information overload caused by the rapid development of online and social media.However,because of news timeliness and short life cycle,it has limitations for the post-release predictions.Pre-release predictions also face huge challenges due to the diversity and difficulty of defining factors.Pre-population predictions before the release of news in existing work have the problem that the multi-source rough data set cannot be processed and the prediction error is large.This paper proposes an improved graph sorting key sentence extraction algorithm based on Doc2 vec to extract news key sentences.Based on news surface information and key sentences,feature extraction is performed by multi-feature fusion.Combined with neural network structure gated recurrent unit training regression prediction model is proposed.A news popularity prediction framework that can process multi-source coarse data sets and greatly reduce prediction errors.The work of this paper mainly includes the following four points:(1)The design of web crawler has obtained different types of news security data from 10 domestic information security portals.There are 25,939 pieces of news data with different structures.After preprocessing operations such as deduplication and filtering of data,it can be used for news classification,popularity prediction and natural language processing.(2)The popularity prediction before the news release mainly depends on the news itself.Different sources and different types of news have different news structures and constituent elements.How to rely on the basic elements of news for feature mining and extraction is a key factor affecting the accuracy of frame prediction.In this paper,a feature extraction method based on multi-feature fusion is proposed.Considering the versatility and future portability of the framework,the basic feature mining is carried out from the news data set and the text feature subset and metadata feature subset are constructed.At the same time,this paper extracts news key sentences and constructs content feature subsets.The text feature subset,the metadata feature subset,and the content feature subset are merged to form a final feature set.The multi-feature fusion extraction can fully exploit the basic information of the news,and at the same time extract the potential features of the news content.(3)Compared with the surface information of news such as news headlines,news authors and news categories,the news text contains more potential information such as news subject matter,news writing style,news freshness and so on.The dataset used in this paper is news data from 10 information security portals.The dataset is rough and the structure is not standardized.About 80% of the news in the dataset is medium-length news,which contains a large number of redundant sentences.Feature extraction directly on the news body will face huge calculations and invalid calculations.This paper proposes a key sentence extraction algorithm to extract key sentences from the news body.Based on Doc2 vec,the news sentence vector expression is obtained,and the initial score of the news clause is determined according to the Text Rank score and the sentence self-characteristic score;the final sentence score is calculated by performing similarity rebalancing on the news sentence ranked according to the initial score.(4)When the regression prediction predicts the popularity of news before the news release,the commonly used model is the machine learning linear regression model,and the simple linear regression has large prediction error.For the first time,this paper introduces the simplified variant structure of long short-term memory network structure gated recurrent unit and the full-join layer into the field training regression prediction model.Through multi-feature fusion extraction and gated recurrent unit for regression prediction,this paper proposes a multi-feature fusion news popularity prediction framework based on deep learning.Compared to traditional processing methods,the framework is able to handle the multi-source rough data sets of this paper and greatly reduce the error of prediction.At the same time,because the gated recurrent unit structure used in this paper is simpler than the long short-term memory network structure,it can shorten the prediction time and improve the calculation performance.
Keywords/Search Tags:Popularity Prediction, Gated Recurrent Unit, Multi-feature Fusion, Key Sentence Extraction
PDF Full Text Request
Related items