Font Size: a A A

Research On User Comment-oriented Network Toxic Behavior Detection Technology

Posted on:2021-03-17Degree:MasterType:Thesis
Country:ChinaCandidate:S LiFull Text:PDF
GTID:2428330647461945Subject:Engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the growth of online social activities,more and more people tend to express their opinions and emotions on the Internet,and there is an increasing number of online comments.Through the extraction,classification and analysis of these data can help us understand the opinions and emotions expressed by different people on different things.The analysis of online comment data of social networks is also of great important to the analysis of public opinion of network intelligence and network security.Therefore,it is particularly important to study and discuss the related data analysis techniques.Although many detection methods in different fields and disciplines have been proposed successively,the current detection model still has some deficiencies for different data: 1)The toxi comment text is large,noisy,updated quickly and targeted.Therefor the existing detection model cannot be directly used for toxic comment detection;2)There is a serious class imbalance in toxic comments,which leads to increased difficulty and low efficiency in toxic behavior detection;3)The simple structure,weak learning ability and difficult feature extraction of the traditional model make the fitting effect of the detection model poor.In response to the above three problems,this dissertation mainly did the following work:(1)Aiming at class unbalance and sample overlap,a toxic behavior detection model(SSU-BG)based on sample stratification undersampling and Bi-GRU network is proposed.First,preprocess the review text and construct a feature model based on the characteristics of the data set.Then,based on the Euclidean distance,calculate the highest density points of the high-frequency samples and the average distance within the class,and divide the high-frequency class into dense,sparse,and the boundary layer of the sparse area,then the sampling areas of different levels of rings are divided in the dense area according to the number of sample labels,and the overall sampling ratio is measured according to the average value of the imbalance of various types of samples.The domain layer is randomly undersampled to solve the class imbalance and overlap problems from the data level.Finally,the text vector is input into the trained Bi-GRU model.The experimental results showed that the detection accuracy of SSU-BG model was up to 9.98% higher and the error rate up to 39.14% lower than other models in the 21 groups of experiments,and the other two evaluation indexes were also improved.Therefore,the model not only improved the overall detection effect,but also solved the problems of class imbalance and sample overlap.(2)To solve the problem of class unbalance and insufficient setting of the sampling ratio of the SMOTE algorithm,an improved toxic behavior detection model(AS-BL)based on the improved SMOTE and Bi-LSTM combined network is proposed.First,preprocess the data set and continue to expand the feature model.At the same time,since toxic words are mostly concentrated at the end of the sentence,the end-of-sentence words are added to vectorization to enhance the accuracy of the classification.Then,the K-Nearest Neighbor algorithm is used to calculate all the average density of high-frequency samples,the sampling ratio is calculated according to the average sampling density,and the samples generated near the samples are calculated according to the sampling ratio and placed into the low-frequency sample set.Finally,the text vector is input into the trained Bi-LSTM model.The experimental results show that the detection accuracy of the AS-BL model is 0.8% higher and the error rate is 0.52% lower than that of the SSU-BG model proposed at point(1).Similarly,the other two evaluation indexes are also improved,and the deficiencies of the class imbalance and striking algorithm are solved...
Keywords/Search Tags:Social networks, Toxic comments, Data imbalance, Stratification undersampling, AD-SMOTE, Text detection
PDF Full Text Request
Related items