Font Size: a A A

Research On Toxicity Detection Of Internet Speech Based On Deep Learning

Posted on:2022-02-12Degree:MasterType:Thesis
Country:ChinaCandidate:J L ZhangFull Text:PDF
GTID:2518306509954769Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the opening of Internet platforms,various platforms have opened corresponding social channels in order to increase their stickiness with users,and users can freely express their personal opinions and opinions.However,some users take advantage of the inter-temporal and transparent characteristics of the Internet platform to violate the management regulations of the Internet platform,and wantonly publish toxic comments that are not conducive to the healthy development of the country,social solidarity and stability,and harm the physical and mental health of others,which has a negative impact on the society.The identification of online speech toxicity is of great significance for improving user experience,purifying the network environment and promoting the healthy development of national society.In order to effectively solve the serious situation of toxic speech raging on the Internet and no corresponding measures to manage,this paper proposes a deep learning-based network speech toxicity detection model BLAM to identify network toxic speech.The research work of this article has the following aspects:(1)BERT toxicity scenarios have supervised pre-training.Toxic speech is a short text,with a maximum length of 220 words.How to make full use of its information when the characteristics and themes are extremely unobvious and the number of characters is very short,and how to identify the uncivilized semantics and toxic speech of the short text to the greatest extent.What needs to be solved is the characterization of short texts.The BERT pre-training model is retrained by applying the experimental data set of this article to obtain the BERT training model of the poisonous scene.(2)Build a network speech toxicity detection model based on deep learning.In this paper,based on the BERT training of the poisonous scene,the deep neural network model BLAM is constructed through the fusion of the bidirectional long-term shortterm memory network(Bi?LSTM),the self-attention mechanism(Self-Attention)and the global maximum pooling layer(Max-Pooling)to achieve anti-toxicity Recognition of speech.(3)Reduce unexpected deviations caused by identity information.This paper uses the multi-task sharing mechanism to study the deviation of the model under different tasks and different sharing mechanisms.Finally,it is determined that the model trained under the hierarchical sharing mechanism using the identity recognition task and the poisonous speech detection task has the lowest deviation in detecting the poisonous speech that mentions the identity.In this paper,the deep learning-based toxicity network speech detection model BLAM has a Recall of 88% and an AUC value of 95% on the toxicity detection task.The experimental results verify the effectiveness of the model.
Keywords/Search Tags:toxic speech, Multitasking, self-attention, BERT, Bi?LSTM
PDF Full Text Request
Related items