Font Size: a A A

The Research On Chinese Spams' Identification Based On SVM

Posted on:2007-03-21Degree:MasterType:Thesis
Country:ChinaCandidate:S J QiaoFull Text:PDF
GTID:2178360275957662Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the quick development of Internet, E-mail has been widely used as one of the most fastest and economic contact methods. But many Chinese Spam are sent through the Internet such as commence ads, disseminative ads, malicious mails, etc. These Spam not only occupy the capacities of mail server, but also waste the energies and time of users to manage them, which decrease the efficiency of the enterprise and damage the users'legal rights and interests. So how to deal with Chinese mails and identify Chinese Spam is a big problem of users'concerns. At present, some techniques have been used in the Spam features abstraction in some researches, but these techniques have some shortages and there are some problems in identifying the Chinese Spam. Accordingly, it is significant to explore an effective way in Chinese Spam'identification.Support Vector Machine (SVM) is one of the most important techniques in data mining, which is based on statistics. It has special capability in solving limit samples, nonlinear and high dimensional mode identification. In addition, the study of SVM has get satisfying effect in the fields of text classification.Focus on the shortage of the Chinese Spam identifiable techniques, a sequential minimal optimization (SMO) algorithm which is based on SVM classification algorithm is proposed to take out the Chinese Spam features. SMO algorithm includes three steps. First, using the Maximum Matching (MM) method, a mail document is divided into many single words. Then the Vector Space Model (VSM) is used to change the mail document into vector. In the last part of the article, a Chinese Spam is identified by using SMO algorithm. In addition, several algorithms about SVM is studied, especially the SMO and its use in mail document feature abstraction.Through analyzing the feature of the Chinese Spam, the mail's feature is complicated. In order to find the features that can identify the Chinese Spam, new technology and way of text mining and text classification, especially SMO algorithm is used. Meanwhile, the experimental result in limited simulated data is comparatively satisfy, which shows that these algorithms is applicable for the Chinese Spam identification.
Keywords/Search Tags:Chinese Spam, feature abstraction, SVM, SMO
PDF Full Text Request
Related items