Research On Mail Classification Technique Based On Decision Tree

Posted on:2008-06-19

Degree:Master

Type:Thesis

Country:China

Candidate:F Y Li

Full Text:PDF

GTID:2178360215487617

Subject:Computational Mathematics

Abstract/Summary:

PDF Full Text Request

Along with the applied fast fierce development of the Internet, theE-mail got a more and more extensive application. On the one hand theE-mail provides economy, convenience and fast service for people, andon the other hand it also provides to make use of the opportunity that itcarries on illegal behavior and publicizes for some businessmen andillegal member. The first time anti- investigation report of the spamcondition in China shows in 2006, from November, 2005 to February,2006, the Chinese Internet customer receive of the spam comparisonattain 63.97% and the spam results in the loss of 6.3 billions RMB fornational economy annually.Categorizing percolation to the E-mail is effectively the main meanswhich deals with a spam. The current percolation technique means ismainly divided into two kinds: one is the percolation which aims at a mailaddress; another is the percolation which aims at a mail contents. Thesetwo kinds of techniques all lack intelligence and adaptability. So studycharacteristic of identify the spam according to the continuously changeof the mail, automatically establish and update new spam characteristiccode and filter rule condition, and intelligently used for new mail filtersystem have bigger and realistic meaning.This paper carried on a research on mail classification's technique,main work as follows: 1. Analyze probably appeared the spam type, deeply study currentstate of mail classification technical domestically and internationally,especially mail classification technique based on decision tree.2. Put forward a kind of improved attribute choice standardaccording to test attribute to classification contribution. That methodwhile computing contribution degree of attribute to classification to thecrunodes inside each one, the choice database scope is the demarcation ofits father node's branch, not the whole training database. Compared withoriginal one, result of the decision tree is shallower and the accuracy ishigher.3. Theory proves that method has not variety bias. Compared withthe choice standard of the information entropy, efficiency is higher.4. Build up an mail classification model based on decision tree, andimplemented the model simulator. The model has the ability of self-adaptand self-study. When there appears new and different characteristic fromthe history trains database, the total amount account and keep both in themail characteristic vector database. When the characteristic vectornumber of variety reach a certain value, then start the formation of mailrule.

Keywords/Search Tags:

Mail Classification, Decision Tree, variety bias, Model

PDF Full Text Request

Related items

1	Decision Tree Classification Algorithm And Its Application
2	Researches On Decision Tree Algorithm And Its Application To Discovery Of Query Interface
3	The Research On The Algorithms Of Optimizing Decision Tree Classification
4	Research On High-score Image Classification Based On Decision Tree Model
5	Inductive Decision Tree Classification Model In The Military Transport Vehicle Management System
6	Based On Decision Tree Classification Method
7	Research On The ID3Algorithms Of Decision Tree
8	The Decision Tree Classification Method Is Applied Research In The Insurance Business
9	Research On Automatic Classification And Prediction Of Archives Based On Decision Tree
10	Mail Content Classification Method Based On SVM