Font Size: a A A

Research And Implementation Of Bad Message Text Detection Method Based On Frequent Pattern Mining

Posted on:2013-09-16Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhangFull Text:PDF
GTID:2208330434470261Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of the platforms such as Sina weibo, QQ, MSN, BBS and so on, more and more message texts can be seen in Internet. These message texts can be very useful to us, as they share information more conveniently. But in other side, they also can bring us a lot of trouble. These message texts will be very harmful to our Internet environment if their information is about violence or eroticism and so on. So we need to take care of these texts, and try to block such information from spreading in the Internet. In order to accomplish this goal, we need to classify bad message texts.The technology of classification is mature nowadays. It is based on statistical learning, and works well on normal texts. Because of the features of message texts, normal classification is not working well on them. So we need to find another way different from former methods to classify these message texts.After research on the features of message texts, and the frequent pattern, finally we find an effective way to solve the problem. This paper comes up with a system which is bad message texts frequent pattern extraction and classification. With the help of this system, we can easily classify the bad message texts, and block them to refine our Internet environment.As to the design of the system, it includes two main steps, training step and classifying step. During training step, we first preprocess the message texts in the bad message texts training set. After preprocessing, we extract the frequent pattern. During classifying step, again we first preprocess the message text, and then calculate the similarity between the message text and bad message texts frequent pattern. When these two steps are done, the system can classify the new message texts to judge whether it is a bad message text or not. As we do in the experiment, we find this system works well on bad message texts classification.
Keywords/Search Tags:Frequent Pattern, Message texts, Text Classification
PDF Full Text Request
Related items