Font Size: a A A

Research On Spam Comments Recognition Based On Fusion Classiers

Posted on:2013-01-02Degree:MasterType:Thesis
Country:ChinaCandidate:X LiuFull Text:PDF
GTID:2218330362960682Subject:Computer Technology and Engineering
Abstract/Summary:PDF Full Text Request
With the growing popularity of the Internet, interpersonal communicating is more convenient. As a new platform for communicating, Blogs are gaining the wide support of people, and have also attracted the eyes of Spam makers. Many Blogs are opened to the online comment in order to increase the amount of users. And the spam comments with nothing to do with the content of Blogs have badly interrupted the communication of readers and the clean of web communities. This paper proposed a method based on the fusion of classifier in spam comments recognition for as to effectively filter all these spam comments.First of all, we introduce the spam comments filtering based on Na?ve Bayesian classifier. The features of spam comments have been summed up during browsing lots of comments in some well-known Blogs. This paper uses regular expressions to match these'strong features', the advertisements information in spam comments, and treats the generalized characters as the features of spam comments in its VSM model.Then,because the spam comments will have changed both in forms and content with the elapse of time, if the classifier only relies on the early training set, it will not have a good performance nor dynamically track the needs of users. Concerning the defect, a self-feedback mechanism has been introduced. Because the filtering of spam comments can be treated as a binary text classification, and the features of normal ones are broad and not regular, but the spam comments have some quite obvious features, the problem could be converted to a single learning problem.At last, to improve the representative ability of feedback feature set, this paper considers synthetically both the similarity of new comment and the center vector of the existing spam comments and the Na?ve Bayesian discriminant to extract spam comments with new features as feedback subset, dynamically modifies the center vector of spam comments and approaches constantly to the ideal model. A spam comments filtering model based on fusion classifiers have been proposed. The experiment results show that the modified classifier has improved significantly than the traditional one and it has a good overall stability.
Keywords/Search Tags:Spam Comments, Strong Feature, Center Vector, Fusion Classifier, Incremental Feedback
PDF Full Text Request
Related items