Font Size: a A A

Research On Network Undesirable Text Filtering Based On Social Platform

Posted on:2022-08-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y J WangFull Text:PDF
GTID:2518306347951339Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology,people are getting closer and closer through social platforms.Because of the virtuality and immediacy of social platforms,all kinds of complicated and chaotic information are spread freely on the Internet.Short text is the main form of dissemination of bad information on social platforms.How to accurately and effectively filter these bad texts circulating on social platforms is a very socially meaningful research work.Natural language processing technology provides a feasible solution for the problem of network bad text filtering,but the text expressions spread on social platforms often do not conform to the language rules,and there are also a large number of word variants,emotional language blending,etc.,which makes the network bad text filtering The work has a certain complexity.Aiming at this difficulty,this paper proposes a hybrid deep learning framework for network bad text filtering method,adopting a label enhancement strategy based on reinforcement learning,so that the model can understand the purpose of label classification more quickly,so as to effectively realize the filtering work of network bad text.The main research work is as follows:First,a method for filtering bad texts on the network is proposed by fusing the pre-training model BERT and the convolutional neural network.This paper transforms the filtering of network bad language into short text multi-classification.Traditional short text classification requires several steps:feature selection,feature extraction,and classifier classification.The features extracted by machine learning methods are often less applicable,leading to poor classification results.In response to this problem,this paper proposes a hybrid deep learning bad text filtering framework BERT-CNN that integrates the pre-training model BERT and Convolutional Neural Network(CNN).Experimental results show that the model can effectively improve the performance of bad text classification on the test data set.Secondly,a method for filtering bad texts on the web based on reinforcement learning and label enhancement is proposed.Short text is limited by its length,resulting in less documented semantic information.Since the number can only represent the index during classification,the number label also loses part of the semantic information,and the rich semantic information implicit in the label needs to be mined.Therefore,this paper proposes two label enhancement strategies:one that extracts a subset from the input text as an extended label and one that uses a generative model to generate tags randomly.Experimental results show that bad text classification methods based on reinforcement learning and label enhancement can effectively reduce the error rate of text classification.Based on the above research,this paper builds a social platform-based Chinese network bad text filtering platform.Through the aforementioned hybrid deep learning framework and reinforcement learning method,the automatic classification of unlabeled bad texts can be realized,and the automatic identification and automatic recognition of bad new words on the Internet can be carried out.Visualization of classification results.
Keywords/Search Tags:Short Text Classification, Network Undesirable Text Filtering, Deep learning, Reinforcement Learning
PDF Full Text Request
Related items