Font Size: a A A

Spam Filtering Method Based On Text Mining

Posted on:2010-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y H ZhongFull Text:PDF
GTID:2208360275483493Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The short-message as a communicative way to receive and send message conveniently, and also a way of good mobility and low price gradually affects people's life. However, with the increasing popularity of the message, junk message problem has become increasingly serious, which not only affects people's life, but also seriously affect the social stability and public safety. The filtering of junk messages is an urgent mission to be solved at present,so the research of intelligent monitoring technologies is of great significance. The technology of filtering junk message currently used generally includes black and white list technology, the rules of matching and keywords filtering. However, the existing technologies at present need to analyze the message one by one, which will cause network congestion of message service center, and make the sending of the message untimely.In order to overcome the shortcomings of existing filtering technologies, a filtering algorithm of junk message based on sampling is proposed, which used some short messages to delegate the entire short message. Meanwhile a concept of user's confidence is introduced. The different confidence of the user's message send, the different intensity detect in message service center. The detected messages are classified by the short message text and it has no need to analyze short message one by one. The method improves the efficiency of handling junk messages, which integrates a number of technologies of filtering junk messages. In the text classification, the synergetic neural network are introduced .But in classical Haken model, it is extremely difficult to identify in massive data. Through some synergetic neural network, parameters are changed by the principle of rapid identification when the balanced attention parameters are the same, and the algorithm is improved to adapt to the massive junk message. At the same time, the existing filtering technologies and synergetic neural network are used in MapReduce model which is introduced in this paper. Experiments show that the method of filtering the sampling-based junk message has been more greatly improved than the content filtering in accuracy and processing time.The main work includes:(1) Summarizes the problem of the existing junk short message filtering technology, and at the same time describe the definition, hazards of the junk message.(2) Summarizes the synergetic neural network model's principle and at the same time this paper improves the existing model, which has a multi-input single-output and suitable for mass short message identification.(3) A sampling monitoring method is implemented which converges the existing junk short message filtering technology(4)The exist filtering technologies are used in MapReduce model which is introduced in this paper.
Keywords/Search Tags:Junk Message, Text mining, Synergetic Neural Network, The Sampling Filter, MapReduce Model
PDF Full Text Request
Related items