Font Size: a A A

Research On Cyberbullying Language Based On Social Networks

Posted on:2021-01-14Degree:MasterType:Thesis
Country:ChinaCandidate:L QiangFull Text:PDF
GTID:2428330602465439Subject:Engineering
Abstract/Summary:PDF Full Text Request
Over the past decade,more and more people have used social networks such as Sina Weibo,Facebook and Twitter,leading to an exponential increase in the number of users and the amount of user-generated content on these platforms.The dissemination of information is conducive to the transmission of new ideas and promote people's communication and exchanges,but also leads to attacks,abuse,slander and other network violence language.These cyberbullying language not only bring mental and psychological pain to users,but also seriously affect the harmonious environment of social networks.At present most of the social networking platform not take effective measures,only a small number of common Internet violence words are filtered and blocked.As the number of text comments on social networks reaches tens of thousands or even millions,it is impossible to manually identify the cyberbullying language,therefore,the research on the method of automatic recognition and detection of cyberbullying language is of great significance for the intervention of network violence and purification of network environment.This paper aims at the characteristics and forms of cyberbullying language.Data were collected from Sina Weibo and features were selected using a semi-supervised learning method with a small amount of human intervention.After eight iterations,we established a high-quality corpus of cyberbullying language and studied the detection method of cyberbullying language text on this basis.This paper compared the classification effect of three kinds of machine learning algorithm model—SVM,NB and LR.Among them,the accuracy rate of SVM combined with N-gram characteristics can reach 78%.In order to improve the accuracy of text categorization and solve the problems of unstructured data and inaccurate Chinese word segmentation in text preprocessing,we use the character embedded vector as the input of the convolutional neural network model,the experimental results show that the accuracy rate,recall rate,and F1 value of the Char-CNN model are improved compared with other model methods.
Keywords/Search Tags:Social Networks, Cyberbullying Language Corpus, Text Categorization, Char-CNN
PDF Full Text Request
Related items