Font Size: a A A

Design And Implementation Of DTFS Algorithm For Spam Filtering Of University OA System

Posted on:2016-02-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y YuFull Text:PDF
GTID:2348330512971407Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,some criminals use email advertising information and dissemination of bad information,which spread to the community has brought great harm,internal office network in universities,colleges office Spam is also very common,not only cause serious adverse effects,but also gives users a negative experience.As such not only affects the people's daily life,but also social stability and unity may be adversely affected.Therefore,the university office process,spam filtering and processing has now become a problem can not be ignored,the issue has become the focus of college application software developer attention.This article is based on the research background,the university office mail filtering subsystem has been studied,and content-based spam filtering technology.Firstly,the components of a complete college office spam filtering model and function of each part,the entire model,including:text pre-processing model,feature dimension reduction model,five-part text representation model,the classifier model and the results of the assessment model.In a detailed analysis of the composition of the various principles and mainstream technology outside of the model,the feature dimension reduction algorithm for in-depth research,analysis of the importance of the feature dimension reduction for text classification,and proposed an improved feature selection algorithm.on the basis of the text classification model based on a careful analysis,we find characteristic dimension reduction terms for a classification system is an integral part of,not only because of the huge space will feature a heavy burden for the processing computer,and in feature space which still contains a lot of redundant information,which seriously affected the final classification,feature reduction goal is not to reduce the performance of the classifier based on as much as possible to reduce the dimension of the feature space.There are two traditional methods of feature reduction:feature extraction and feature selection.The former is based on changes in the feature space,the original feature space is mapped to the new low-dimensional feature space by certain rules,and minimize the losses in the process of feature information;the latter is the use of certain rules extracted from the original feature set among a feature subset,the subset represents the maximum extent in the original feature set.Validated found a good feature reduction algorithm can not only reduce the dimension of feature space is largely,and can improve the classifier performance to some extent.In the proposed on the basis of the improved algorithm,this paper establishes a complete college office mail filtering subsystem feature selection model,which includes word processing,to stop word processing,stemming,and feature selection and other functions.In text categorization process,the use of well-known data-mining software Weka,use Weka output classification results with conventional feature selection algorithm were analyzed.In filtering based on the analysis of experimental results on the college office mail,to carry out the proposed algorithm time complexity and space complexity analysis and found that the time of this article complexity and space complexity no more than the traditional feature selection algorithms.In this paper,the corresponding experiment with document frequency,mutual information,information gain and chi-square statistics were compared with the proposed method,the recall,precision and F1 values up to measure,such as evaluation of the merits of the algorithm.Feature selection algorithm proposed in this paper is not only the performance of the dominant,and the computational complexity is smaller.This college office spam filtering system can effectively filter spams.
Keywords/Search Tags:Feature Selection, Text Classification, Spam Filtering, DTFS algorithm
PDF Full Text Request
Related items