Font Size: a A A

Study On Email Classifying Technique Based On Data Mineing

Posted on:2005-03-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2168360125463895Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the widely use of Internet, Email ,a main communication way , are mostly used by people. But the Spam coming with it becomes big troubles. The USA lost 10 hundred millions ever year by statistic suffered from Spam. From the statistical report of Chinese Internet development actuality, issued by Chinese Internet Center at 2003.7,we can get that ever 16 emails received by Chinese net-user contains 9 Spams, which exceed normal emails. In our country lots of Spams engross net bandwidth and make Email Server stop working. Spam seriously disturb people's normal uses for its forcly , cheatly, unhealthy and repeatly character. The Spams waste people's time money and vigor, transmit eroticism content, spread fallacies to deceive people, that make big troubles to society.However with the fast increasing of spam ,the anti-spam technology stop going ahead .As the current anti-spam technology lack of aptitude and autolearning, it can not identify new Spam by learning from the former Spam instances. Althoughe some anti-spam technology have auto learning character, such as Byesi filter technology, it only work on the content of email and ignore email's head fields ,which is the most shortage of this technology.In My text I choose Data mining technique, suggestted by Byesi filter technology, to study an autolearning anti-spam technology. Data mining technique has become the core technology of the intelligence commerce. It has been widely used in many areas and drawn the attention of the whole academe. Some algorithms and techniques of artificial intelligence, including determination Tree and neural networks, have been applied in data mining to do prediction, pattern recognition, classification and Clustering.After analysis and study on email, by disperseing and charactering email, my paper use vector to express email. And then bring forward a determination tree classifying model base on information entropy. At last I do a series of experiment and testing. The result of experiment and testing prove that the model can find how to identify the new Spams by learning from the field , network , structure and content informations of emails. It shows that our model and method work well.
Keywords/Search Tags:Data Mining, information entropy., determination tree, spam
PDF Full Text Request
Related items