Font Size: a A A

Research On Mail Classification Technique Based On Decision Tree

Posted on:2008-06-19Degree:MasterType:Thesis
Country:ChinaCandidate:F Y LiFull Text:PDF
GTID:2178360215487617Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
Along with the applied fast fierce development of the Internet, theE-mail got a more and more extensive application. On the one hand theE-mail provides economy, convenience and fast service for people, andon the other hand it also provides to make use of the opportunity that itcarries on illegal behavior and publicizes for some businessmen andillegal member. The first time anti- investigation report of the spamcondition in China shows in 2006, from November, 2005 to February,2006, the Chinese Internet customer receive of the spam comparisonattain 63.97% and the spam results in the loss of 6.3 billions RMB fornational economy annually.Categorizing percolation to the E-mail is effectively the main meanswhich deals with a spam. The current percolation technique means ismainly divided into two kinds: one is the percolation which aims at a mailaddress; another is the percolation which aims at a mail contents. Thesetwo kinds of techniques all lack intelligence and adaptability. So studycharacteristic of identify the spam according to the continuously changeof the mail, automatically establish and update new spam characteristiccode and filter rule condition, and intelligently used for new mail filtersystem have bigger and realistic meaning.This paper carried on a research on mail classification's technique,main work as follows: 1. Analyze probably appeared the spam type, deeply study currentstate of mail classification technical domestically and internationally,especially mail classification technique based on decision tree.2. Put forward a kind of improved attribute choice standardaccording to test attribute to classification contribution. That methodwhile computing contribution degree of attribute to classification to thecrunodes inside each one, the choice database scope is the demarcation ofits father node's branch, not the whole training database. Compared withoriginal one, result of the decision tree is shallower and the accuracy ishigher.3. Theory proves that method has not variety bias. Compared withthe choice standard of the information entropy, efficiency is higher.4. Build up an mail classification model based on decision tree, andimplemented the model simulator. The model has the ability of self-adaptand self-study. When there appears new and different characteristic fromthe history trains database, the total amount account and keep both in themail characteristic vector database. When the characteristic vectornumber of variety reach a certain value, then start the formation of mailrule.
Keywords/Search Tags:Mail Classification, Decision Tree, variety bias, Model
PDF Full Text Request
Related items