Font Size: a A A

Research On Detection And Recognization Of Network Offensive Speech Based On Multi-task Learning

Posted on:2022-10-26Degree:MasterType:Thesis
Country:ChinaCandidate:K L WanFull Text:PDF
GTID:2518306551970699Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
The rapid development of the Internet has changed the way people live and entertain,and it provided more convenient social channels.While communication and sharing are no longer restricted by distance,problems follow one after another.Offensive speech are flooded in the cyber world,destroying the civilized online communication environment.Offensive speech is text content that is targeted at a specific individual or group,or the text content which can cause discomfort to the viewer,and it is common on various social media platforms.Accurate automatic detection tools can effectively curb the proliferation of offensive speech,so research focuses on the use of machine learning methods to detect and recognize offensive speech.Offensive speech related subtasks include detection tasks and recognization tasks.The goal of the detection task is to determine whether the text is offensive.The difficulty lies in the various ways of showing offensiveness,and it is often necessary to combine the context to make analysis.The sample text is generally short and has a small amount of data,so the information that can be obtained is very limited.The goal of the recognition task is to determine the direction of the offensive text.The core lies in the need to focus on the offensive parts of the offensive text and the objects modified by these parts.In response to the key and difficult points raised,this thesis has conducted the following research works:(1)Aiming at the difficulties of detection tasks,this thesis proposes a BERT-based multitask learning model for offensive speech detection.The main idea is to introduce more useful information for the detection task while effectively using the text context.The BERT pretraining model can provide context-sensitive word representations for text and obtain additional language information learned from a large-scale corpus.The multi-task learning framework can obtain features from specific auxiliary tasks and provide the main task with feature information that is conducive to the task goal.(2)Since there is no relevant data set publicly available in Chinese,this thesis collects and annotates user comments on Sina Weibo based on the scale of the relevant English data set,and builds a small Chinese Weibo offensive speech data set for verification the validity of the detection model in the Chinese scenario.The experimental results show that the detection model is still effective in Chinese dataset.(3)Aiming at the focus of the recognition task,this thesis proposes an attention-fused and BERT-based multi-task learning model for recognization.The main idea is to use the attention mechanism to focus on the part of the offensive text that embodies the offensive and the object of modification.The model continues the advantages of using the BERT model and the multitask learning framework.At the same time,the attention module is integrated into the structure to filter out the feature information that is more suitable for the task,and then more accurate judgments can be obtained.
Keywords/Search Tags:Offensive Speech, Text Classification, BERT Model, Multi-task Learning, Attention Mechanism
PDF Full Text Request
Related items