Font Size: a A A

Research On Auto-learning Anti-spam Services With No-labeled

Posted on:2011-07-27Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhangFull Text:PDF
GTID:2178330332963515Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the application and development of computer network and communication technology, e-mail has been widely used. However, as subsequently growing problem of spam, the research and development of anti-spam technology has received more and more attention.At present, junk mail filtering model based on POP3 has been realized, but the model is not satisfactory in its stability and efficiency of the e-mail parsing and classifying process.In content-based message filtering methods, rule-based junk mail filtering method can classify spam in the condition of unlabeled samples and there is ready-made rule base can be shared; but its drawback is that this method requires hand-written rules and the rules are formed after the emergence of junk mail feature, so it often missed new spam.In terms of self-study of E-mail filtering system, junk mail filtering system based on co-training can improve performance of classification to meet the self-adapt needs of users by using small-scale labeled samples and large-scale unlabeled samples.Based on the above research background, we optimized junk mail filtering model based on POP3 (SAMFUF) and propose a method by which can realize auto-learning anti-spam services with no-labeled sample.The main work and innovation of thesis are as follows:(1) On the basis of realizing junk mail filtering model based on POP3, re-design the process of e-mail parsing and optimize data structures of the dictionary and others in original model. Change the operation on text files of modules in the original model into the operation of the memory variable, so it can avoid frequent disk I/O operation. Realize e-mail parsing process in the form of a static library to simplify calling program.(2) Change classification program into a lib file, then put it into the original junk mail filtering model based on POP3. Use the method of the thread calling classification algorithm instead of the method of calling classification program by starting the process, and it can obviously improve the efficiency of mail classification.(3) Through the research of rule-based filtering method, we demonstrate that rule-based filtering method can achieve junk mail filtering without labeled samples. By researching the filtering method based on co-training, we argue that the filtering method based on co-training method has a strong ability of auto-learning, and it can improve performance of the classification gradually through auto-learning. So we propose a method to achieve the requirement of auto-learning anti-spam services with no-labeled, which uses fusion method based on the rules and based on co-training, and join it into the optimized junk mail filtering model based on POP3.The experiment results show that the optimized junk mail filtering model is more efficient than the original model; meanwhile the fusion method based on the rules and co-training can achieved the requirements of auto-learning anti-spam services with no-labeled and has a more good classification performance.
Keywords/Search Tags:anti-spam, rule, co-training, mail filtering
PDF Full Text Request
Related items