Font Size: a A A

Design And Implementation Of Spam Filtering System In Social Platform

Posted on:2018-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:R R YeFull Text:PDF
GTID:2428330545461192Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,social networking platforms have become an important way for people to communicate with each other,which brings great convenience to people.The data is growing rapidly,spam seriously interferes with users' normal communication,damages users' interests,and even endangers social stability.It is urgent to purify cyberspace and create a healthy and good social system.Therefore,the spam filtering technology in social platform has become a hot topic in current research.A kind of spam filtering system in social platform based on text classification algorithm is designed and implemented in this thesis.The main contributions are illustrated as follows:(1)the basic principle of data crawling is elaborated,and data collection and labeling of the TianYa BBS is carried out to build experimental corpus as the experimental data.The data collection is to read and analyze the TianYa BBS page by writing python scripts to call urllib2 and BeautifulSoup.Data labeling is based on manual labelling.(2)the spam filtering algorithm based on text classification algorithm is studied,and the k-Nearest Neighbor,logistic regression,support vector machines,random forests and neural network is implemented.Analyze advantages and disadvantages of the five filters by time consumed and performance.The support vector machine algorithm has the best filtering effect,but the time is relatively long.Aiming at the problem that a large number of semantic information is lost in spam filtering due to ignore the text structure,a support vector machine algorithm based on word sequence kernel is used in spam filtering.Because there is no concept of sentences in the original word sequences kernel,a kind of sentence extract word sequence kernel is proposed in this paper,improves the accuracy of spam filtering.(3)a social platform spam filtering system(SFS)is proposed,includes data importing module,data preprocessing module,feature selection module,spam filtering module,etc.,which TF-IDF,information gain,expected cross entropy,mutual information method are used in feature selection module.The system is tested by function and performance.In this spam filtering system in social platform,the content of the post was tested and analyzed.If content belongs to whitening and slimming product advertisement,the post basically can be filtered,the effect of filtering other types of spam is not good.Because most of the data that comes from TianYa BBS is product advertisement,while have very little other types of spam.The analysis and filtration of spam in social platform help software developers or the users find and filter spam,reduce the number and survival of spam in the social platform,to avoid losses to the enterprises and individuals result from spam.
Keywords/Search Tags:Social platform, Spam filtering, Text classification, Feature selection, Word sequence kernel
PDF Full Text Request
Related items