Font Size: a A A

Research And Impiementation Of SMS Automatic Classification

Posted on:2009-08-02Degree:MasterType:Thesis
Country:ChinaCandidate:J ChenFull Text:PDF
GTID:2178360245969325Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The key technology of short-message monitor system and service user behavior analysis is the technology by which short-message texts are classified automatically. A Corpus-Based Statistics-Oriented (CBSO) methodology is developed to classify short-message texts. Compare with ordinary methodology of text classifying, this methodology has five features. We use Vector Space Model to represent the short-message texts. Because many short-message texts are problematic texts , the size of feature is mainly Chinese character and English word , and few is Chinese word . As far as the structure of classifier, a plane classifier is designed, which consists of some binary-classifiers. And about Classification-Algorithm, a special center-vector algorithm is designed, in which center-vector is calculated from feature-evaluation function. Expected-Cross-Entropy is used as feature-evaluation function, but it is modified to be fitted in with center-vector algorithm. In order to design the training algorithms for this classifier, the concept of feature-extraction-threshold and classification-threshold is defined. And exhaustion is used to train the classifier. By the technology above, a fledged short-message text classifier is built. The recalling-rate of open-test can reach 91.6, and the precision of open-test can reach 92.4%. The effect of classification can fulfill the demand of short-message service user behavior analysis.
Keywords/Search Tags:short message, text classification, algorithm user, behavior analysis
PDF Full Text Request
Related items