Font Size: a A A

Design And Implementation Of Spam Messages Classification System

Posted on:2016-12-27Degree:MasterType:Thesis
Country:ChinaCandidate:J ChenFull Text:PDF
GTID:2428330491960035Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Internet and the rapid development of information technology open up a new era of big data.Dissemination of information,whether speed or breadth has been greatly improved.Dissemination of information carrier-SMS has been prefered by the majority of users of all ages.At the same time,the rapid development of short message service,also brings a lot of people's lives disrupted and disturbed.This paper is based on research and development of the status quo at home and abroad,combined with the actual business projects,then designs a classification system of spam messages.The system is mainly through the existing classification SMS training,and generate classifier.Then treat the classified messages are classified by the classification in order to achieve the classification of spam messages function.In order to improve the accuracy of spam messages classification system performance and classification,this paper mainly studies the following two aspects:1)Spam messages contain multi-variant characters,such as:Traditional,phonetic word,Shaped nearly word or sensitive words with special symbols.The system is aimed at this point by improving Naive Bayes algorithm.Extraction segmentation of this system is based on statistical probability,and it is not based on the traditional dictionary.First,removing the special symbols and simplifing the traditional word,these operations can avoid the bad affect on Extraction segmentation.Then,calculating the probability of simultaneous occurrence of adjacent words.if the probability of simultaneous occurrence of the word is big,then these characters are in a word,otherwise they are not a word.2)Spam messages classification process using multi-level classification.The experiment proved that a classification level by level accuracy is better than traditional classification.Therefore,this paper uses the spam messages multistage classification.First,set an category tree,the role of the category tree is setting the level of categories.Then the classification of spam messages start from zero level to the highest level of category tree.Finally,this paper compare Spam messages classification system to the traditional information classification system.The results show that:the classification system in this paper is more suitable for classification of spam messages,and it improves performance classification,at the same time,classification effect is ideal to achieve the desired objectives.
Keywords/Search Tags:Naive Bayes classifier, spam messages, participle, Data Mining
PDF Full Text Request
Related items