Font Size: a A A

The Application Of Text Categorization In Short Message Filtering

Posted on:2007-05-30Degree:MasterType:Thesis
Country:ChinaCandidate:Z J WangFull Text:PDF
GTID:2178360212957473Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
For the good mobility, low price, entertainment and convenience, people gradually get used to communicate using short messages. But trashy short messages have become more severe. Statistics show that the trashy short messages have a very fast growth rate since 2001. Today, the average number of short messages which users receive daily has exceeded the normal number. Therefore, the study of automatic filtering of short messages is of great significance.Firstly, this paper introduces the development status quo of trashy short message and now Anti-trashy message technology, as well as the basic concepts and principles of short message filtering. Secondly, analyze and compare seven types of feature selection methods, four types of feature weight calculation methods and five types of representative text categorization algorithms. Then introduce the principle of Bayesian classification mainly. Analyze the limitation existed in short message filtering using the traditional Bayesian algorithm. Legitimate short message is misjudged which can bring user greater losses. On this basis, we adopt the minimum risk-based short message Bayesian filtering algorithm. The experimental results on Chinese message corpus show that this algorithm correctly classifies short messages at the same time, legitimate messages can also reduce false alarms. We obtain good performance when classifying and filtering short messages. Finally, the feedback and learning problems in classification system of short message are analyzed and discussed.This article contains mainly the following contents:(1) Under the real conditions that there is no an open and standardized Chinese message corpus, a true and standardized one that is able to adapt to experiment is established.(2) Summarize the status quo of trashy message filtering investigating, including the definition of trashy message and generating, as well as the filtering technology used often.(3) Introduce the correlative theories and knowledge of the text categorization in detail. Compare the often used methods of feature selection, feature weight calculation methods and text categorization methods which are able to adapt to the classification of message.(4) Among the existing text classifiers, two typical classifiers NB and KNN are used to carry out experiment on Chinese message corpus. Analyze and compare the experimental results and performance.(5) Introduce the Bayesian algorithm in detail, analyze the limitation existed in message filtering using the traditional Bayesian algorithm. Introduce a minimum risk-based Bayesian...
Keywords/Search Tags:Trashy Message Filter, Text Classification, Simple Bayesian
PDF Full Text Request
Related items