Font Size: a A A

Research On LSTM-Based Social Network Spam Filtering

Posted on:2020-09-02Degree:MasterType:Thesis
Country:ChinaCandidate:H M ShiFull Text:PDF
GTID:2428330590495918Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,the way that people communicate with each other is mainly through the social network.Meanwhile,more and more spam have seriously affected people's daily life and even threatened their property and life security.Massive spam will also cause the waste of network resources.Some lawbreakers maliciously spread false information to get others people's private information and at the same time send commercials to them.Accordingly,we need to purify the environment of the Internet and build a better social system.On account of these demands,the thesis comed up with a method called the social network information filtering method based on LSTM algorithm.The main body is divided into the following sections:In the first place,we analyzed the characteristics of the data sets from a large number of social network information,and then drawed a conclusion that the amount of normal information is far exceed spam.This thesis adopted the typical SMOTE to handle with the imbalances.Since the shortcomings of the SMOTE,we proposed the improved algorithm called GP-TSMOTE to balance the social network information data sets.Secondly,starting from the characteristics of the information text.There are lots of distractions in the text will affect the results of the classification,such as the figures,symbols and emotions and so on.Therefore,we clean the information text to distinguish between the junk text and normal text.At the same time,we need to select the feature words of the text.The thesis choosed a method which combined with the information gain and the chi-square test.In the end,the processed information texts are classified respectively by Naive and LSTM algorithms.Comparing the advantages and the disadvantages between the two algorithms,we concluded that the LSTM algorithms are more suitable to handle with the filtering of the social network information.Based on the experimental results,we further compared the influence of SMOTE and GP-TSMOTE algorithms on the classified results.And it is found that the improved GP-TSMOTE algorithm is superior to SMOTE algorithm in classification results.Also,adjusting the neighbor parameter K to observe its influence on the classification results of the GP-TSMOTE.
Keywords/Search Tags:Spam message, GP-TSMOTE, LSTM, Short text classification
PDF Full Text Request
Related items