Font Size: a A A

A Multi-language Anti-vector Space-based Messaging System Design And Implementation

Posted on:2007-12-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y P WangFull Text:PDF
GTID:2208360185491300Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In this modern society, a lot of information is transferred via cell phone, and short message is a very important format. Normal short messages accelerate out communication with the society, but more and more people or organizations begin to make use of short messages: they send out a large number of spam short messages, such as advertisement and deceit. These spam short messages have made great bad effect on our daily life. How to recognize the spam short messages has become an urgent task.In this paper, we make some research in using the Vector Space Model to classify the short messages. This system can be divided into three parts: first, training; second, testing; third, application. The first and second steps have been implemented on the personal computer, and the third step has been implemented on the smart phone with the operating system of windows mobile.In training, according to VSM, we get the feature list file and the machine learning model file, and two class center vectors are contained in the machine learning model file. In the procedure of testing, after computing the cosine between the testing vector and the class center vector, we can classify the testing sample into a specific class. We set up the threshold value for both normal short message recognition ratio and spam short message recognition ratio. If normal short message recognition ratio and spam short message recognition ratio were both beyond the threshold values which have been expected of, we go to the application step. In this step, we can directly work on the short message which is received on the phone: this system will decide if the received short message is a normal one or a spam one, and according to this judgment, the received short message will be put in the specific folder on the phone. It is proved that, this system has got a nice identification ratio for both traditional Chinese and simplified Chinese short messages.
Keywords/Search Tags:Vector space model, Machine learning, Anti-spam short message, Text classification
PDF Full Text Request
Related items