Research On Spam Comments Recognition Based On Fusion Classiers

Posted on:2013-01-02

Degree:Master

Type:Thesis

Country:China

Candidate:X Liu

Full Text:PDF

GTID:2218330362960682

Subject:Computer Technology and Engineering

Abstract/Summary:

PDF Full Text Request

With the growing popularity of the Internet, interpersonal communicating is more convenient. As a new platform for communicating, Blogs are gaining the wide support of people, and have also attracted the eyes of Spam makers. Many Blogs are opened to the online comment in order to increase the amount of users. And the spam comments with nothing to do with the content of Blogs have badly interrupted the communication of readers and the clean of web communities. This paper proposed a method based on the fusion of classifier in spam comments recognition for as to effectively filter all these spam comments.First of all, we introduce the spam comments filtering based on Na?ve Bayesian classifier. The features of spam comments have been summed up during browsing lots of comments in some well-known Blogs. This paper uses regular expressions to match these'strong features', the advertisements information in spam comments, and treats the generalized characters as the features of spam comments in its VSM model.Then,because the spam comments will have changed both in forms and content with the elapse of time, if the classifier only relies on the early training set, it will not have a good performance nor dynamically track the needs of users. Concerning the defect, a self-feedback mechanism has been introduced. Because the filtering of spam comments can be treated as a binary text classification, and the features of normal ones are broad and not regular, but the spam comments have some quite obvious features, the problem could be converted to a single learning problem.At last, to improve the representative ability of feedback feature set, this paper considers synthetically both the similarity of new comment and the center vector of the existing spam comments and the Na?ve Bayesian discriminant to extract spam comments with new features as feedback subset, dynamically modifies the center vector of spam comments and approaches constantly to the ideal model. A spam comments filtering model based on fusion classifiers have been proposed. The experiment results show that the modified classifier has improved significantly than the traditional one and it has a good overall stability.

Keywords/Search Tags:

Spam Comments, Strong Feature, Center Vector, Fusion Classifier, Incremental Feedback

PDF Full Text Request

Related items

1	Research On Identification And Filtering Of Spam Comments For BBS Comments
2	Research On On-line Spam Filter Fusion With Mas-sive Data
3	Research On The Method Of Identifying Microblogging Spam Reviews
4	Recognition Method Of Microblog Spam Comment Based On CNN
5	Research On Identifying Comments Spam For Blog Comments
6	Research Of Spam Comment Identification In The Microblog Based On AdaBoost-LC
7	The Design Of Cascade Type Image Spam Filtering System
8	Research On Image Spam Filtering System
9	The Research About The Application Of Improved KNN Algorithms In Spam Filtering
10	Image Spam Detecting Based On Combinatorial And Statistical Classifier