Font Size: a A A

Research On Content-Based Short Message Filtering System

Posted on:2007-10-24Degree:MasterType:Thesis
Country:ChinaCandidate:Q WangFull Text:PDF
GTID:2178360185466460Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
The WSM (Worthless Short Message) problem is becoming more and more serious and has attracted extensive attention from the public. The content-based filtering is one of the main techniques used solving the WSM problem.This thesis introduces the WSM-concerned short message's pipeline, four models of Natural Language's expression and the currently well-used three kinds of filtering methods adopted in the text filtering field. By using the FreeICTCLAS system developed by the Computing Tech. of Chinese Academy of Sciences, the conversion from sentence to words is realized and lays the groundwork for the further categorization. Through the use of Naive Bayes and risk minimization, the traditional bayes filtering method is improved and the accuracy of categorization is enhanced. Moreover, ten models of WSM are constructed, and the meaning filtering is explored by utilizing the similarity between the semanteme of Hownet and KNN algorithm. On the basis of techniques mentioned above, this thesis proposes a content-based short message filtering system for mobile phone, and its core is the text categorization. The BP neural network is adopted during the process of constructing the categorization, integrating the bayes filtering and meaning filtering to draw the optimum parameters. Finally, the whole system is tested and the right rate of 70% is achieved.Although the construction of the whole system is accomplished in this thesis, there are still some defects needing further deep study.
Keywords/Search Tags:Worthless Short Message, Content-Based Filtering, Text Classification, BP neural network, Natural Language
PDF Full Text Request
Related items