Font Size: a A A

Research On Cyberbullying Detection In Social Media

Posted on:2021-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:N J LuFull Text:PDF
GTID:2428330605450797Subject:Information security
Abstract/Summary:PDF Full Text Request
Cyberbullying in social media often has a bad influence.Effective detection of cyberbullying has important social and academic implications.It is difficult to learn bullying features,due to the user-generated content of web text,including spelling errors,grammatical errors and other noises.Therefore,cyberbullying detection has remained a difficult and unsolved problem for academics.In order to improve the accuracy of cyberbullying detection,this paper establishes a neural network model,learns character combination features and semantic features,and introduces Shortcuts to fuse the above features,effectively avoiding the interference caused by noise in user-generated content.The main work and innovations of this paper are as follows:(1)In order to provide Chinese dataset for cyberbullying detection tasks,this paper builds an open source Weibo dataset,which enables the detection model to learn Chinese cyberbullying features from real-world scenarios.Data are collected from Weibo users' comments on public figures who caused controversial events or bad reviews,and manually labeled to make the dataset suitable for supervised text classification tasks.(2)In order to learn the features of cyberbullying,this paper proposes the Char-CNNS model.For the noise problem in user-generated content,the model learns the character combination features and semantic features,and introduces a shortcut strategy to fuse features of different neural network levels.The results of cyberbullying detection experiments on Weibo datasets and Twitter datasets show that the Char-CNNS model is superior to TF-IDF+SVM,N-gram+LR,CNN in Precision,Recall,and F-measure indicators.(3)In order to reduce the interference of data category imbalance on cyberbullying detection,this paper combines cost-sensitive method to improve the robustness of the model through Focal Loss function.The experimental results show that,compared with the Cross Entropy function,the Focus Loss function has a stable performance,and the F1 value increases by 2.7%,2.9%,4.5%,3.4%,and 8.3% respectively in the datasets of five different categories(positive and negative cases are 1:1,1:2,1:5,1:10,and 1:20,respectively).
Keywords/Search Tags:cyberbullying, user-generated content, text classification, convolutional neural network
PDF Full Text Request
Related items