A sublexical unit based hash model approach for spam detection

Posted on:2010-06-11

Degree:Ph.D

Type:Dissertation

University:The University of Texas at San Antonio

Candidate:Zhang, Like

Full Text:PDF

GTID:1448390002489500

Subject:Computer Science

Abstract/Summary:

This research introduces an original anomaly detection approach based on a sublexical unit hash model for application level content. This approach is an advance over previous arbitrarily defined payload keyword and 1-gram frequency analysis approaches. Based on the split fovea theory in human recognition, this new approach uses a special hash function to identify groups of neighboring words. The hash frequency distribution is calculated to build the profile for a specific content type. Examples of utilizing the algorithm for detecting spam and phishing emails are illustrated in this dissertation. A brief review of network intrusion and anomaly detection will first be presented, followed by a discussion of recent research initiatives on application level anomaly detection. Previous research results for payload keyword and byte frequency based anomaly detection will also be presented. The drawback in using N-gram analysis, which has been applied in most related research efforts, is discussed at the end of chapter 2. The importance of text content analysis to application level anomaly detection will also be explained. After a background introduction of the split fovea theory in psychological research, the proposed sublexical unit hash frequency distribution based method will be presented. How human recognition theory is applied as the fundamental element for a proposed hashing algorithm will be examined followed by a demonstration of how the hashing algorithm is applied to anomaly detection. Spam email is used as the major example in this discussion. The reason spam and phishing emails are used in our experiments includes the availability of detailed experimental data and the possibility of conducting an in-depth analysis of the test data. An interesting comparison between the proposed algorithm and several popular commercial spam email filters used by Google and Yahoo is also presented. The outcome shows the benefits of the proposed approach. The last chapter provides a review of the research and explains how the previous payload keyword approach evolved into the hash model solution. The last chapter discusses the possibility of extending the hash model based anomaly detection to other areas including Unicode applications.

Keywords/Search Tags:

Hash model, Detection, Sublexical unit, Approach, Spam, Application level

Related items

1	Investigation Of Spam Filtering Technologies Based On Statistical Approach
2	Acoustic modeling of sublexical units by using a structural-connectionist approach
3	Research On Spam Review Detection Based On Integrated Multi-feature
4	Research And Design Of General Accelerate Unit For Hash Function
5	A sender-centric approach to spam and phishing control
6	Research On Review Spam Detection Approach Based On Topic And Sentiment Analysis
7	Research On Shielding Mechanism Of Short Message Spam And It's Application
8	Research On Data-driven Detection Model Of Abnormal User Behavior In Mobile Internet And Its Application
9	Research On Deceptive Spam Review Detection For Combining Multi-features
10	Research And Implement On Hiden Web Spam Detection Technology