Font Size: a A A

Research On The Improvement Of Multi-label Text Classification Algorithm For Offensive Language In Social Media

Posted on:2024-03-28Degree:MasterType:Thesis
Country:ChinaCandidate:B L GuoFull Text:PDF
GTID:2568307067463574Subject:Engineering
Abstract/Summary:PDF Full Text Request
Due to the widespread use of social media and the wide difference in user quality,the phenomenon of offensive speech such as cyber bullying,gender antagonism,abuse and hate speech is very prominent,which seriously affects the physical and mental health of users,damages a healthy online environment and is not conducive to the construction of a harmonious society.Therefore,the precise classification of offensive texts in social media applications has become an urgent task.Diversified social media offensive texts have richer semantics and finer granularity.Traditional single-label text classification is difficult to meet the needs of accurate classification of offensive texts.In order to accurately classify offensive texts by using rich semantic information in a fine-grained manner,we improved the multi-label classification method for offensive speech from three aspects: sequential dependence,polysemy problem and miscommunication:(1)A multi-label classification method for offensive texts in social media based on joint embedding mechanism is proposed.This method uses the joint embedding mechanism to guide the model to pay more attention to the dependency between semantic information and tag features in text sequences,the dependency between tag features and keyword features,and the correlation between text semantic information and keyword information,so as to alleviate the dependence of text sequences and tag sequences on order.In addition,the multi-task learning guide model is used to carry out targeted learning according to different tasks to improve the generalization ability of the model.The experimental results show that the keyword information in the joint embedding mechanism can effectively improve the F1 score and accuracy of the model in the multi-tag classification task.(2)A multi-label classification method for offensive texts in social media based on mutual information fusion mechanism is proposed.This method uses the mutual information fusion mechanism to learn the higher-order semantic association between semantic information and label features in text sequences,improve the polysemy problem of the same word in different contexts,assist the model to learn deeper contextual semantic representation,and enhance the classification performance of the model.The experimental results show that the model exceeds the benchmark model in F1 scores,and the ablation experiment verifies that it can assist the model to make multi-label classification decisions by integrating high-order semantic association of semantic information and label information in text into the training process of the model.(3)A multi-label classification method for offensive language texts in social media based on attention enhancement mechanism is proposed.In this method,the expectation gate mechanism is used to guide the model to learn the discriminative feature information,reduce the negative influence caused by the error propagation problem,and enhance the classification accuracy of the model.The experimental results show that the model outperforms the benchmark model in a number of performance indicators,and the ablation experiments verify that the model can learn features more sensitive to classification decision by introducing the expectation gate mechanism in the training process.
Keywords/Search Tags:multi-label text classification, multi-task learning, Condition label co-occurrence forecast, semantic-association, dependencies
PDF Full Text Request
Related items