Font Size: a A A

SVM-Based Novel Method Of Online Spam Filtering

Posted on:2014-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:N JiangFull Text:PDF
GTID:2248330395497474Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Today, internet technology is widely used, the development of networktechnology is growing rapidly, the network technology provides a great convenienceto people’s lives. However, when providing people with massive amounts ofinformation, it also raised another problem, that there are a lot of spam in the networkinformation dissemination, e-mail is a common form of a garbage informationdissemination.Spam, also known as “Spam” in the world, is a letter that people do not expect ordo not require received. Spam has given today’s Internet communication a lot ofimpact and damage: taking up a lot of network resources, resulting in networkcongestion, affecting the user experience; lot of spam have the purpose of violatingusers’s privacy, wasting time and mailbox space, What is more, for seeking profit,damages are caused to the legitimate rights and interests of the uses. In addition, partof spam contains virus, when the user clicks, users’ personal data are propbably stolen,modified, deleted, affecting the normal life and the work of the user.For these hazards of spam, there is an urgent need for an anti-spam method or atool. The study of the anti-spam technology started in1993, the collection anddiscussion for spam were mainly concerned in this stage, gradually, some sampleanti-technologies emergenced, such as e-mail blacklist technology; In1998,discussions of how to filter the spam effectively were expanding, many well-knownservice units and the organizational structures are established in this time, of whichwe are more familiar with: ORBssPAMCOP, SPANHAUS, MAPS; since1999,anti-spam technology has drawn more and more attention, many well-knowndomestic and foreign organizations and research institutions have begun to fight spam.Many advanced theories, such as machine learning, genetic algorithm has beensuccessfully applied to this field, attractting the attentions of many interdisciplinaryresearch scholars.Existing spam filtering algorithms have the following problems:1) In theclassifying process, documents are often been mistakenly classified, the accuracy of classification is not very high;2) traditional algorithms can not guarantee the onlinereal-time filtering of spams. In response to these problems, this paper presents anonline support vector machine-based spam filtering algorithms. Considering the SVMclassification results only relate to the set of support vectors in each training set, sothe algorithm filters out all non-support vector support vectors, and replaces theoriginal set of training samples, which can effectively reduce the redundant samplesunder the condition that the training speed is not lowing. In addition, the algorithmalso defines the uncertainty factor of classifying results, for determining whetherputting the samples into the original training sample set for repeat training. Thealgorithm not only utilizing the classify results of historical data, but also enhance therecognition accuracy of the algorithm through learning the misclassified samples.Finally, the simulating experiments are carried. The results proved that the algorithmis fast, can rich the set of training samples based on the recognition results andcomplete online spam filtering jobs under the premise of ensuring the classificationaccuracy.
Keywords/Search Tags:Spam, Anti-spam, Blacklist, Machine Learning, Genetic, Algorithm, Support Vector Machine
PDF Full Text Request
Related items