With the rapid development of Internet, E-mail, one of the first Internet applications, has become an indispensable communication tool in our daily life and work. However, the emergence and proliferation of spam takes up a lot of storage resources and network bandwidth. Meanwhile, network viruses, pornography, fraud, reactionary and other junk information carried by email seriously impacts the normal usage of network. How to guarantee the safety and health of the content of E-mail which is one of the most wildly applications used on Internet, is a problem demanding prompt solution.In recent years, to avoid the detection of the text-based spam filtering system, spammers embed junk information in the image, and attach it to the message body. The traditional text-based filter cannot handle such spam image. In order to deal with the spam which contains both text and image, a filtering method, which fuses text, image and other multi-modal features, is proposed in this paper. Firstly, extract the text features and image features of E-mail. Secondly, use P-SVM to train different feature sets, and then construct text-based and image-based classifier. Finally, apply multiple classifier fusion technology to integrate the output of each classifier, so that the features of text and image can be fused. The method effectively combines the characteristics of text filtering and image filtering in spam filtering technology, and achieves the fusion of multi-modal features in spam filtering.Experiment on TREC dataset shows that the fusion method has a better result than that of a single classifier and achieves over 90% in accuracy rate. |