Research On Cyberbullying Language Based On Social Networks

Posted on:2021-01-14

Degree:Master

Type:Thesis

Country:China

Candidate:L Qiang

Full Text:PDF

GTID:2428330602465439

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

Over the past decade,more and more people have used social networks such as Sina Weibo,Facebook and Twitter,leading to an exponential increase in the number of users and the amount of user-generated content on these platforms.The dissemination of information is conducive to the transmission of new ideas and promote people's communication and exchanges,but also leads to attacks,abuse,slander and other network violence language.These cyberbullying language not only bring mental and psychological pain to users,but also seriously affect the harmonious environment of social networks.At present most of the social networking platform not take effective measures,only a small number of common Internet violence words are filtered and blocked.As the number of text comments on social networks reaches tens of thousands or even millions,it is impossible to manually identify the cyberbullying language,therefore,the research on the method of automatic recognition and detection of cyberbullying language is of great significance for the intervention of network violence and purification of network environment.This paper aims at the characteristics and forms of cyberbullying language.Data were collected from Sina Weibo and features were selected using a semi-supervised learning method with a small amount of human intervention.After eight iterations,we established a high-quality corpus of cyberbullying language and studied the detection method of cyberbullying language text on this basis.This paper compared the classification effect of three kinds of machine learning algorithm model�SVM,NB and LR.Among them,the accuracy rate of SVM combined with N-gram characteristics can reach 78%.In order to improve the accuracy of text categorization and solve the problems of unstructured data and inaccurate Chinese word segmentation in text preprocessing,we use the character embedded vector as the input of the convolutional neural network model,the experimental results show that the accuracy rate,recall rate,and F1 value of the Char-CNN model are improved compared with other model methods.

Keywords/Search Tags:

Social Networks, Cyberbullying Language Corpus, Text Categorization, Char-CNN

PDF Full Text Request

Related items

1	Research And Implementation On Automatic Construction System For Text Categorization Corpus
2	An Automatic Chinese Text Categorization System Based On Statistical Language Model
3	Research On Cyberbullying Detection In Social Media
4	Language Independent Text Categorization
5	Research On Language Independent Text Categorization
6	A Study On Text Categorization Based On Machine Learning
7	Study On Cross Language Text Categorization
8	The Research On Cross Language Text Categorization Based On Interlingua Semantic
9	Chinese Text Categorization On Weapon Corpus
10	Research And Improvement On Automatic Construction System For Text Categorization Corpus