Font Size: a A A

Design And Implementation Of The Spam Filtering System

Posted on:2010-09-30Degree:MasterType:Thesis
Country:ChinaCandidate:X J HeFull Text:PDF
GTID:2208360275983754Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years, with Chinese mobile phone growth in the number of users, mobile phone short message business has been the rapid development, but also junk SMS speed up very quickly on the rise. At present, the main junk SMS filtering technology there is blacklist filtering, key words filtering and text categorization based on content filtering. Blacklist filtering and key words filtering can quickly filter junk SMS, but the filtration rate of accuracy is not high; text-based junk SMS filtering classification relies mainly on the accuracy of training samples in its quantity and quality. Because the daily message involved in the privacy of individuals and the message center never made it public, there are few samples of the existing message. Simply rely on the text of the classification of its message filtering accuracy rate is not high. At the same time, the existing filtering techniques are used one by one detected. To the message center where has a large number of short messages every day, its computation is huge and it would create message service center network plug.Therefore, from the existing technology,the accuracy and efficiency of the junk SMS filtering still can not meet the realistic needs.This article will address the existing shortcomings of junk SMS filtering technology to study the effective methods to resolve or improve, while the introduction of new technologies and original combination of filtering techniques to enable message filtering system trash can has better filtering performance. In this paper, the main job done as follows:1. Studied the application of junk SMS filtering system at the Keywords search algorithm using WM algorithm many thought pattern matching. This algorithm set up through the pre-processing of the hash table to accelerate the matching speed and, in accordance with the characteristics of junk SMS put forward by compression TRIE stored pattern tree to organize ways to accelerate the search speed.2. Analysis the main text classification techniques, focuse on the minimum risk-based Bayesian classifier at the application of junk SMS.3. Proposed to use the log analysis method to analyze the message has been filtered, extracting useful data and update keywords and classification of training samples library, implementation of self-optimization system.4. Combined the Bayesian classification with the new filtering methods (including flow testing, sample testing, log analysis) to ensure the accuracy of junk SMS filtering circumstances, improve the efficiency of message filtering. Finally, give the design and implementation of the entire junk SMS filtering system.
Keywords/Search Tags:junk SMS, Bayesian classification, Key words filtering, Log Analysis
PDF Full Text Request
Related items