Font Size: a A A

Research On Spam Filtering Method Based On Social Networks

Posted on:2017-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:W Y YuFull Text:PDF
GTID:2348330488459960Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Nowadays, spam filtering based on social networks mainly includes two core methods, using pattern matching based on rules and using machine learning based on the unbalanced data sets. With the popularity of intelligent terminal equipment, the real-time data of the social networks information is explosive growth. Thus, the collaboration of the two major methods becomes the main way of spam filtering. The first level filter is to use the first method in order to pursue a faster processing speed. By using the second method, the second level filter could pursue higher classification ability. The two aspects of the main research work are as followed:In the area of pattern matching filtering method based on rules, the multiple pattern matching algorithm called ACF, which is based on FPGA, is proposed in order to effectively improve processing speed. Based on the AC state automata and got rid of the failure function, ACF algorithm structures ACF state automata based on four bits and 16 forks of the tree. According to the experimental results, ACF algorithm is feasible. Compared to the same kind algorithm, the processing performance of ACF algorithm is improved significantly and can be used for spam filtering effectively.In the area of machine learning based on the unbalanced data sets, the resampling algorithm SDR is proposed in order to improve the ability of reclassification. Undersampling and oversampling are combined by SDR algorithm. In the stage of oversampling, SDR algorithm can employ new samples by making fully use of the spatial distribution characteristics of the data. In the stage of undersampling, SDR algorithm can keep the important information in the majority class using the clustering idea. Moreover, SDR algorithm can improve the classification performance by iterative optimization and removing the noise. According to the classification experiment results, SDR is feasible. The classification ability is higher compared to the same kind algorithm, and can be used for spam filtering more effectively.The two algorithms can both solve the problem of spam filtering based on social networks more effectively.
Keywords/Search Tags:Spam Filtering, Multiple Pattern Matching, FPGA, Resampling
PDF Full Text Request
Related items