Font Size: a A A

Research And Implementation Of SMS Content Filtering Technology Under The Chinese Mobile Platforms

Posted on:2009-11-01Degree:MasterType:Thesis
Country:ChinaCandidate:X ChenFull Text:PDF
GTID:2208360245961044Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
The Chinese oriented SMS filtering technology is needed in the nowadays Chinese Mobile Market. At present, there are mature SMS filtering technologies in English. However, today's Chinese SMS filtering technology is based mainly on Junk-list and Key word filtering. The system proposed by this article realizes the Simple Rules Filtering technology, which combines SMS content features and promotes the Precision, Recall Rate, and reduces the Normal False Alarm Rate.SMS Content Filtering is a type of Text Categorization technologies. At present, there are two most popular technologies applied to Content Filtering: Maximum Entropy and Decision Tree. In this Article, these two algorithms are used to do a contrast filtering test with a newly introduced Chinese SMS Content Filtering technology. This technology is divided into two parts: The first part is Rules Matching. Rules Matching is the first phase of SMS Content Filtering. In this phase, the Key Rule Matching is the most important algorithm. Key Rule Matching needs to use a Chinese multi-pattern Matching Algorithm. However, the classic algorithms like AC and WM are both designed for English content. This Article introduces a new Chinese Oriented multi-pattern algorithm UIAC. Together with UIAC, we also use other rules to abstract the content features of Chinese SMS: the Length of the short messages,the phone numbers, punctuations, and URL, et al. Besides, in this phase, the Chinese Encoding transformation should be done. The output file of this phase is the vector intermediate file. The second part is filtering. Filtering is the second phase of the SMS Filtering. This Article introduces Simple Rules Fitering Algorithm. Compared with Maximum Entropy and Decision Tree, the algorithm is easier to implement on resource limited mobile platform.As a contrast, in the Rules Matching phase of the test, there are three vector intermediate files: Simple Rules Filtering vector file, Maximum Entropy vector file and Decision Tree vector file. The last two files are processed by Maximum Entropy Model and Decision Tree Model. Then compare the Precision Rate, Recall Rate and Normal SMS False Alarm Rate of the three different algorithms. The test uses 1000 Short Messages, with 500 normal ones and 500 junk ones. The 1000 SMS are used as input data in the three algorithms mentioned above. The results show that the Simple Rules Algorithm has a close performance with the other two algorithms. Moreover, it has an advantage in the aspect of Normal SMS False Alarm Rate and efficiency of implementation.
Keywords/Search Tags:SMs Content Filtering, Simple Rules Set, Multiple Pattern Match, Maximum Entropy, Decision Tree
PDF Full Text Request
Related items