Research In Filtering Of Short Message Service Based On Content Mining

Posted on:2008-07-09

Degree:Master

Type:Thesis

Country:China

Candidate:M L He

Full Text:PDF

GTID:2178360215979982

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Short Message Service (SMS) is becoming one of the fastest and most economical ways of communication available. At the same time, the growing problem of junk SMS (also referred to as"spam SMS") has generated a need for SMS filtering. SMS filtering is an important task in the life of people, which receives increasing emphasis. Nowadays, anti-spam SMS measures commonly include black or white list technology, manual rules and keyword based content filtering. According to the disadvantage in traditional methods and to solve practical problems of SMS filtering, automated text categorization and information filtering is proposed. Such algorithms of text categorization as Naive Bayes, kNN, Decision Tree and Boosting can be applied in Information filtering.In this paper, we have developed a new filtering spam SMS system based on improved Bayesian. The minimum risk strategy is used for Bayesian algorithm to learn from the user given training spam/normal SMS set. This model included pretreatment of SMS, Chinese words splitter, characteristic extraction, categorization and flitting of SMS. The primary function and arithmetic with java source code are discussed in this paper. At last we draw an experience to test the accuracy of the software to category Chinese web document. As the experiment result show, this software has high accuracy.The contents of this article are as following:(1) A summary about the state of the spam SMS filtering.(2) The whole design of automatic text SMS classifier is described in this paper.The primary function of each module is discussed. And the new methods proposed by us are also discussed.(3) Chinese text splitter is described in this paper. Based on analysis of all sorts of Chinese text splitter arithmetic, we discussed how to improve max match Chinese text splitter arithmetic. The Chinese dictionary based on hash table is discussed.(4) This paper compares all sorts of feature select arithmetic. The advantage and disadvantage of these arithmetic are summarized. We proposed a new arithmetic named as DFTF( Document frequency and Term Frequency). We give out its reality with java source code.(5) Naive Bayes machine learning method is discussed in this paper. Especially, we discussed the arithmetic of how to category the Chinese web document with naive bayes machine learning. And then we present how to reality such a classier.(6) This paper present how to evaluate the quality of Chinese web document classifier. As the experiment result show, high category quality is obtained on this classifier. We also summarized the gain and defect of this project. Further, we discussed how to improve this classifier in future research.

Keywords/Search Tags:

Short Message Filtering, Chinese text splitter, text categorization, Naive Bayes machine learning

PDF Full Text Request

Related items

1	Chinese WEB Document Automatic Categorization
2	Design And Implementation Of Short Message Classification System Based On Naive Bayesian
3	A Study On Text Categorization Based On Machine Learning
4	Classification System Based On The Theme Of Information Acquisition In The Pages
5	The Research And Application Of Text Categorization Arithmetic In Spam Filtering
6	Text Categorization Based On Naive Bayes Method
7	The Study Of Chinese Text Categorization Based On Concept
8	The Study Of Chinese Text Categorization Based On Na(?)ve Bayes
9	Study And Realization Of Text Categorization In Chinese Speech Recognition Results
10	Incremental Learning Of Naive Bayes Chinese Classification System