Font Size: a A A

Research On Spam Behavior Recognition Technology

Posted on:2012-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:X J LiFull Text:PDF
GTID:2218330338466606Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the popularity of Internet, the E-mail has become one of the favored communication ways in modern human interaction. However, because of the technical flaws of the E-mail system, SMTP protocol (Simple Mail Transfer Protocol) sending E-mail don't do any authentication, which gives the spammers chance to send a lot of spam lured by huge profits. This creates trouble to people, otherwise wastes the Internet resources seriously.In order to construct a green E-mail communication environment, the author analyzes and compares a lot of spam and legal E-mails, then discovers there are differences in sending behavior between sending spam and legal E-mails caused by the motive and psychology of the spammers. First, there is abnormal behavior in the sending information. Second, spams are always broadcasted in one-way "umbrella" shape. After in-depth researched on the two different behaviors, two behavior filtering technology are proposed as following:The First is spam filter technology based on sending behavior characteristics. Through studying the SMTP protocol, the author can find the valuable behavior characteristics to distinguish between spam and legal emails. Guided by the Unified Theory of Information-Knowledge-Intelligence and related data mining theory, E-mail sending behavior characteristics are mined and then establish behavior recognition model, thereby spam can be blocked in the E-mail transfering stage. The main work in this research includes analyzing and extracting head of E-mail information, analyzing and extracting behavior characteristics, vector representation of E-mail, establishing the behavior recognition model and so on. To improve recognition accuracy rate, the link of calculating character contribution is added. The training set was trained to establish behavior recognition model by classification algorithm, then judge spam. In this paper SVM and Naive Bayes classification algorithms are selected, and classification experimented in Weka environment.The second is similarity of topology structure behavior recognition technology, which is spam recognition technology based on spam traffic behavior. Based on analyzing the feature of the topology structure of the E-mail communication, the author learns that legal E-mails and spam are different in the topology structure. The concept of similarity is introduced. By way of comparing the similarities of E-mail users'communication relation, the E-mail users are divided into different sets. The probability of sending or receiving spam of each set was counted. Through judging the ownership of the E-mail sender and receivers, the probability of sending or receiving spam are used to calculate and determine whether the E-mail is spam or not.Finally, evaluation criteria are the recall rate, precision rate and F1 value. Simulation results prove the behavior recognition technologies proposed in this paper shows a fine filter performance. Comparing with simulation results of other literature verifies that the proposed behavior recognition technologies are superior.
Keywords/Search Tags:spam, sending behavior, behavior recognition, topology structure, similarity
PDF Full Text Request
Related items