Font Size: a A A

Research And Application Of Key Technologies Of Text Filtering Based On The Cloud Service Model

Posted on:2015-03-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y L LvFull Text:PDF
GTID:2308330473950639Subject:Information security
Abstract/Summary:PDF Full Text Request
The rapid development of the Internet, making it becomes one of major way for communication. But because of its openness, there is lots of junk information spreading on the network, such as pornography, violence, superstition, reactionary information, which influences people’s daily online activities seriously. Although there are amounts of text filtering technologies, as the change of external environment, the text filtering technology is also need to continuously improve. At the same time, with the continuous improvement of people’s living standard, more and more users get access to the Internet through mobile terminals. How to ensure that mobile users via mobile devices for normal healthy and effective information, this would require the implementation of text filtering technologies on the cloud platform to realize the spam filtering.In this kind of demand, on the basis of analyzing and discussing the existing text filtering key technologies, this thesis has improved the traditional text categorization algorithm based on the Vector Space Model and the Naive Bayes algorithm; And this thesis has used these two improved algorithms to construct a high performance text filtering system; Then the system has been deployed in a cloud platform to provide text filtering service for mobile terminals. This ensures the mobile terminal users could have access to the normal and legitimate Web pages on the Internet via the mobile devices. This thesis’ s main content is as follows:1、 On the basis of the analysis and research of the feature selection methods used commonly in the text filtering technique, the ideas of proportion has been applied to the feature selection algorithms in this thesis, it has enabled the extraction of text feature vector to reflect the text theme and category information more accurately.2、 On the basis of the analysis and research of the weighting calculation methods used commonly in the text filtering technique, the structure information, the length of the information, the proportion of information and other information has been considered in this thesis, and the traditional weighting calculation methods have been improved, so that the weighting value could better reflect the feature items’ importance of Web page categorization.3、 Because the Web page is a structured or semi-structured document, this article has adopted the modular method to process the web pages to realize the text classification. At the same time, the improved weighting calculation method based on the proportion and the feature selections methods with the idea of proportion have been applied to the traditional categorization algorithm based on the Vector Space Model and the Na?ve Bayes algorithm. Then we have adopted these two improved algorithms to construct a high performance text filtering system for Web pages, then the system had been deployed on the cloud platform to provide the text filtering service.Test results show that compared with the traditional algorithms, the improved text categorization algorithms have enjoyed higher accuracy rate, higher precision ratio, lower error rate, lower false rate and so on, so the improved text filtering system has a better performance.
Keywords/Search Tags:text filtering, classification, NB, VSM, the cloud platform
PDF Full Text Request
Related items