Font Size: a A A

Research On High Risk Information Processing Module Of Internet Public Opinion Based On Natural Language Processing

Posted on:2020-11-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y F HaoFull Text:PDF
GTID:2428330596492278Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Nowadays,with the thriving of internet,especially the mobile internet,more kinds of communication are applied to social life boldly.It also became an important channel for transmitting information and participating in social affairs.As a result,the network supervision of public opinion,big data analysis and application begin to play an increasingly important role.Traditional public opinion analysis is a system structure based on statistics and rules,which need a long-term vocabulary screening and manual review to establish a more complete rule mechanism.However,the amount of internet information develops exponentially,it has become impossible to increase manpower input.Therefore,finding an intelligent information processing method has become an main urgent task.This paper mainly focuses on the research of high-risk early warning modules,and aims to establish a module with high recall rate.How to accurately deliver emergencies to relevant departments is the key research direction.In order to improves the recall rate and accuracy,this paper try to redesign key module,rewrite the warning rules,which are combined with deep machine learning algorithms:(1)This paper carries out the preprocessing of the corpus and the rough labeling,and constructs the prediction of “whether to intervene in the model.”.(2)Constructing the deep learning model BiLSTM neural network trains the “information intervention” model for text classification algorithm,and divides the corpus text into three categories(interventional,non-intrusive,irrelevant).(3)Construct different machine learning models,and obtain corresponding features through the modules already processed in the streaming process,for example,automatic classification results,keyword matching numbers,positive and negative indices,geographic features,and trained "whether Intervene in the results of the model,build the data set,train through different algorithms,and finally get the predicted results.Compare the advantages between different algorithms to select the most suitable model for online data analysis.The first part of the paper constructs the “information intervention” module,and uses the deep learning algorithm to extract whether the sentence information is the “intervention information”.The deep learning plays a very important role in the module design,and the extracted intervention information improves the accuracy of the logistic regression algorithm.Four percentage points.The second part is the high-risk public opinion determination module,which adopts the machine learning algorithm instead of the deep learning algorithm,mainly because the depth learning algorithm can not reach the online use standard speed.The final model uses the deep neural network to extract some features,and then all the features are sent to the gradient boosting decision tree to further feature combination.Finally,the logistic regression algorithm completes the classification of high-risk lyrics.This paper uses 400,000 corpora to experiment the above models and compare traditional rule-based data processing methods.The result shows that deep learning is a feature processing.The combination of gradient lifting tree and logistic regression model is most suitable for the processing of this service.Compared with the original methods,this paper improves the accuracy of high-risk information modules.
Keywords/Search Tags:deep learning, machine learning, gradient boosting decision tree, high-risk information processing, natural language processing
PDF Full Text Request
Related items