Font Size: a A A

The Study Of Filtering Machine For Junk Short Message Based On Support Vector Machine

Posted on:2007-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:S H QianFull Text:PDF
GTID:2178360215995262Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Along with mobile phone has been becoming indispensable communication tool in our daily life, short message service which is called"thumb economy"has been developed rapidly. However, a mass of mobile phone junk short message come forth subsequently which severely disturb our daily life. The phenomenon of junk short message widely exists not only in China, but also in the developed countries, such as Britain, America, etc.. So it is a global problem.In this paper, junk short message filtering problem has been put forward as text classification. Junk short message filtering machine based on support vector machine (SVM) has been implemented on Windows XP, Visual C++ 6.0 and Access platform. The modeling and implementation of the system includes four parts, that is, short message segmentation, feature reduction, text representation of short message and auto categorization. In the short message segmentation part, word segmentation has been completed in gather of short message, and word information has been collected into database. In feature reduction part, feature reduction has been completed for the word. In text representation of short message part, short message has been represented to vector on vector space model (VSM). In auto classification part, training sample has been trained, and testing sample has been classified on classifying machine.In this paper, six methods of feature reduction are compared on this system. They are information gain (IG),χ2-test (CHI), mutual information (MI), expected cross entropy (ECE), weight of evidence for text (WET) and principle component analysis (PCA). Test has been proved that IG method performances the best in the system. Modified text representation of short message has been put forward based on method of IG, result of categorization is better than before. On the system, SVM has been researched partly for short message classification. SVM kernel function and parameters optimized have been selected, which have been proved to obtain a good performance on precision.
Keywords/Search Tags:junk short message, support vector machine, feature reduction, vector space model
PDF Full Text Request
Related items