Font Size: a A A

Design And Implementation Of Spam Filtering System Based On Vector Space Model

Posted on:2016-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:K J ZhuFull Text:PDF
GTID:2308330461483090Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Now, in the age of the Internet, more and more people communicate through the Internet, but E-mail is one of the most popular ways to communicate. Communicate with each other is easily and quickly which e-mail system can be used, but users often received spam, the current proliferation of spam has brought great distress to the network and users, and the user of the mail operations and view are very complications, so the filets to spam is essential. Based on these considerations, the paper develop an accurate, fast filter spam system based vector space model, it can filets spam based on content of mail and manage facility.This work is as follows:First, sorting mail, mail classification chosen as the various types of training set Fudan corpus, Mail received represented as a vector model, in the training process by preprocessing, feature extraction, the weight calculation and threshold setting technical training of various types of mail feature vectors derived threshold; Similarity simple method to calculate the distance vector message with various message feature vectors, maximum value, and then compared with the threshold to achieve through categorize the messages; Then choose CCERT spam training set of spam filters, such documentation by mail and spam set generated feature vector similarity calculation, taking the maximum and compared with a threshold value; and finally determine whether the spam.This paper developed a spam filtering system under MyEclipse 6.5 platform, based on C/S structure and use the JAVA language design and implementation the spam filtering system based on content, the filter system is not only improve the filtration rate, but also conduct to e-mail management.
Keywords/Search Tags:Mail classification, Mail filtering, Training process, Vector space model, Simple vector distance method
PDF Full Text Request
Related items