Font Size: a A A

Spam e-mail filtering via global and user-level dynamic ontologies

Posted on:2010-05-02Degree:Ph.DType:Thesis
University:University of Southern CaliforniaCandidate:Youn, SeongwookFull Text:PDF
GTID:2448390002982301Subject:Computer Science
Abstract/Summary:
E-mail is clearly a very important communication method between people on the Internet. However, the constant increase of e-mail misuse/abuse has resulted in a huge volume of spam e-mail over recent years. As spammers always try to find a way to evade existing filters, new filters need to be developed to catch spam. In my research to date, e-mail data was classified using four different classifiers Neural Network, SVM classifier, Naive Bayesian Classifier, and C4.5 Decision Tree (J48) classifier. An experiment was performed based on different data size and different feature size. Feature is a set of words to charaterize domain dataset. The final classification result should be '1' if it is actually spam, otherwise, it should be '0'. This paper shows that a simple C4.5 Decision Tree classifier, which makes a binary tree, is efficient for datasets that can be viewed as a binary tree.We present a new approach to filtering spam e-mail using semantic information represented in ontologies. Ontologies allow for machine-understandable semantics of data [99]. Traditional keyword-based filters rely on manually constructed pattern-matching rules, but spam e-mail varies from user to user and also changes over time. Hence, an adaptive learning filtering technique is deployed in our system. An experimental system has been designed and implemented with the hypothesis that this method would outperform existing techniques experimental results showed that indeed the proposed ontology-based approach improves spam filtering accuracy significantly. Also, we deploy an Image e-mail handling capability by extraction of information from text embedded image e-mail using OCR. Additionally, we improve the spam filter using a personalized ontology in spam decision on gray e-mail. In the proposed SPONGY (SPam ONtoloGY) system, two levels of ontology spam filters were implemented: a first level global ontology filter and a second level user-customized ontology filter. The use of the global ontology filter showed about 91% of spam filtered, which is comparable with other methods. The user-customized ontology filter was created based on the specific user's background as well as the filtering mechanism used in the global ontology filter creation. Using the user-customized ontology filter, we measured the performance improvement by precision, recall and accuracy of classification. Through a set of experiments, it was proven that better classification performance (about 95%) can be achieved using the user-customized ontology filter, which is adaptive and scalable. The main contributions of the paper are (1) to introduce an ontology-based multi-level filtering technique that uses both a global ontology and an individual filter for each user to increase spam filtering accuracy, and (2) to create a spam filter in the form of ontology, which is user-customized, scalable, and modularized, so that it can be embedded within other systems for better performance.
Keywords/Search Tags:E-mail, Spam, Filter, Ontology, User, Global
Related items