With the popularity of social platforms,users tend to use them to post,which inevitably generates conflicts and even some offensive language or bullying behavior against individuals or groups during the communication process.Therefore,it is crucial to study automated methods for offensive language and bullying behavior detection in order to reduce the negative social impact.The sub-tasks related to offensive language detection and bullying behavior detection include offensive text classification and group bullying detection in complex scenarios.Previous studies have mainly formulated the above tasks as a sentence-level classification task and a session-level classification task,respectively,ignoring the key role of more fine-grained auxiliary tasks,such as keyword identification and offensive comment identification.They can enhance the interpretability of the model and produce more reliable predictions.What’s more,previous work considering the introduction of auxiliary tasks usually requires manually annotated external data,but manual annotation of data is more time-consuming and labor-intensive,and exposure to large amounts of offensive comments may also have serious consequences on the mental health of the annotators.Meanwhile,due to the existence of user interaction on social platforms,it is easy to form group bullying behavior consisting of a large number of offensive comments,and the interaction between comments.has not been fully explored in previous work.To address the above challenges,this thesis first proposes a offensive text classification framework incorporating a word-level classification task,where the model captures both global and local data features to produce better sentence representations.It also facilitates the automatic construction of a category lexicon during model learning without manual annotation.Second,this thesis proposes a multi-task framework that introduces the sentence-level offensive comment detection task to improve the session-level bullying detection.Specifically,the two tasks share a text encoding module and a social graph learning module to obtain comment representations from different aspects.The social graph learning module model the interactions among comments in sessions to capture the repetitive behavior nature of bullying.In addition,this thesis proposes a weakly-supervised comment annotator to overcome the limitation that the auxiliary task requires manual annotation of commentlevel labels.In this thesis,the effectiveness of the two proposed algorithms is verified through extensive experiments conducted on publicly available datasets.Finally,this thesis designs and implements a offensive language detection and bullying behavior system based on the above approach.The system provides a visual operation interface and offers two subservices for users,offensive text classification and cyberbullying detection.Each service contains data preprocessing,model training,model testing,and model application functions respectively,which are convenient for users to use. |