Classification Of Sexual Harassment Dialogue Texts Based On BERT-CNN

Posted on:2022-12-20

Degree:Master

Type:Thesis

Country:China

Candidate:M R Yan

Full Text:PDF

GTID:2518306770971739

Subject:Automation Technology

Abstract/Summary:

PDF Full Text Request

In recent years,the rise of online social media platforms has further expanded the range of users with the popularity of the mobile Internet,and more and more people are posting and sharing their views and opinions about different incidents on these social media platforms.However,the content of communication on these platforms is often not well filtered,which is likely to result in users sending or receiving different kinds of sexually harassing messages on different social media platforms.Such messages can have a serious negative impact on a person,leading to low selfesteem and depression,and even self-destructive,suicidal and anti-social behaviour.It is important to note that this type of social phenomenon is already commonplace.However,the vast amount of data generated on these platforms daily makes it difficult for regulators or relevant practitioners to audit this on a case-by-case basis.Moreover,online discourse is a dynamic process,and it is difficult for reviewers to provide a qualitative criterion to distinguish these texts from sexual harassment.Therefore,it would be interesting to be able to automatically detect and classify such messages.This dissertation proposes an automatic classification model for sexually harassing conversational texts,and the main contributions of this thesis are as follows:（1）In response to the lack of a dataset on sexual harassment conversations in the Chinese domain,this dissertation constructs a dataset on sexual harassment conversations in the Chinese domain.After cleaning the sexual harassment conversations,the different layers are annotated.The annotation is based on the harassed person’s tolerance level of the discourse,and the dialogue text is divided into four levels,with the level of sexual harassment increasing step by step.The dataset is also enhanced using a translation-based approach to make the dataset more evenly distributed,which is conducive to the model learning more semantic information during the training phase.（2）To address the problem that the accuracy of keyword matching methods is too poor for frequent misclassification.In this dissertation,a pre-trained model BERT is used to generate word vectors instead of the traditional word embedding model.The comprehensive semantic information in the sentence is represented by the token [CLS], followed by a linear classifier to classify the [CLS] token.（3）To address how the rich semantic information in BERT can be further exploited.This thesis extends the output of BERT with a multi-layer convolutional neural network to further extract features and use maximum pooling for dimensionality reduction,followed by a Re Lu function to reduce the computational effort,and finally a linear classifier for classification.（4）The model proposed in this paper has been demonstrated to be effective in classifying conversational texts in the network through extensive experiments.The experiments also show that the convolutional neural network-based model is more likely to classify correctly when the main semantic meaning of the sentence is determined by a few local key texts.

Keywords/Search Tags:

natural language processing, deep learning, feature extraction, text classification, crime prevention

PDF Full Text Request

Related items

1	Intelligent Device Text Classification Method Based On Natural Language Processing
2	Research On Text Classification Based On Natural Language Processing And Machine Learning
3	Text Classification Based On Natural Language Processing, Analysis And Research
4	Research On Network Text Sentiment Classification Based On Deep Learning
5	Research On Machine Learning For Natural Language Processing And Transmission
6	Research On Information Parsing Based On Text Classification
7	Research On Text Classification Based On Deep Neural Network
8	Research On Deep Learning Methods For Text Classification Tasks
9	Research And Analysis Of Text Classification Theory Based On Deep Learning
10	Research And Application Of Text Classification Based On Deep Learning