Font Size: a A A

A Classification System For Network Violation Information Based On Machine Learning

Posted on:2020-12-29Degree:MasterType:Thesis
Country:ChinaCandidate:J J ShaoFull Text:PDF
GTID:2438330575960694Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,illegal information has begun to increase in the network.In view of the successful application of classification algorithms in the field of machine learning in spam classification,true and false comment recognition,etc.,this paper studies the use of machine learning algorithms to analyze textual data with more complex forms and more content.In order to classify and identify the chat account behind,the result is divided into two categories: normal account and suspected offending account.In this paper,the chat data of a game company's chat software is used as data which are in a certain period of time.After data preprocessing such as word segmentation,keyword extraction and feature selection,the training set and testing set are separated,and then the machine learning algorithms are used.The training set is learned and fitted to evaluate the effect on the testing set.Among them,in the word segmentation stage,this paper mainly uses the JIEBA package in Python to get all the word segmentation results after deleting the stop words.In the stage of extracting keywords,this paper mainly uses the TF-IDF algorithm to extract keyword features.In the classification stage,both the Decision Tree and Random Forest algorithm are used,and the results get compared.The True Positive Recall,True Negative Recall and G-means are choosed as performance metrics.It is found that the result obtained by the Random Forest is better than the one obtained by the Decision Tree.Under the random forest algorithm,when the number of keywords is set to 110,the TPR reaches the highest 0.857,and at the same time,the TNR also reached the highest 0.963,which means the classification effect was good.
Keywords/Search Tags:Text classification, TF-IDF, Decision Tree, Random Forest, Recall, G-means
PDF Full Text Request
Related items