Font Size: a A A

Research In Filtering Of Short Message Service Based On Content Mining

Posted on:2008-07-09Degree:MasterType:Thesis
Country:ChinaCandidate:M L HeFull Text:PDF
GTID:2178360215979982Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Short Message Service (SMS) is becoming one of the fastest and most economical ways of communication available. At the same time, the growing problem of junk SMS (also referred to as"spam SMS") has generated a need for SMS filtering. SMS filtering is an important task in the life of people, which receives increasing emphasis. Nowadays, anti-spam SMS measures commonly include black or white list technology, manual rules and keyword based content filtering. According to the disadvantage in traditional methods and to solve practical problems of SMS filtering, automated text categorization and information filtering is proposed. Such algorithms of text categorization as Naive Bayes, kNN, Decision Tree and Boosting can be applied in Information filtering.In this paper, we have developed a new filtering spam SMS system based on improved Bayesian. The minimum risk strategy is used for Bayesian algorithm to learn from the user given training spam/normal SMS set. This model included pretreatment of SMS, Chinese words splitter, characteristic extraction, categorization and flitting of SMS. The primary function and arithmetic with java source code are discussed in this paper. At last we draw an experience to test the accuracy of the software to category Chinese web document. As the experiment result show, this software has high accuracy.The contents of this article are as following:(1) A summary about the state of the spam SMS filtering.(2) The whole design of automatic text SMS classifier is described in this paper.The primary function of each module is discussed. And the new methods proposed by us are also discussed.(3) Chinese text splitter is described in this paper. Based on analysis of all sorts of Chinese text splitter arithmetic, we discussed how to improve max match Chinese text splitter arithmetic. The Chinese dictionary based on hash table is discussed.(4) This paper compares all sorts of feature select arithmetic. The advantage and disadvantage of these arithmetic are summarized. We proposed a new arithmetic named as DFTF( Document frequency and Term Frequency). We give out its reality with java source code.(5) Naive Bayes machine learning method is discussed in this paper. Especially, we discussed the arithmetic of how to category the Chinese web document with naive bayes machine learning. And then we present how to reality such a classier.(6) This paper present how to evaluate the quality of Chinese web document classifier. As the experiment result show, high category quality is obtained on this classifier. We also summarized the gain and defect of this project. Further, we discussed how to improve this classifier in future research.
Keywords/Search Tags:Short Message Filtering, Chinese text splitter, text categorization, Naive Bayes machine learning
PDF Full Text Request
Related items