Font Size: a A A

Research Of Identifying Splog Based On Multiple Structure Features

Posted on:2012-07-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y HeFull Text:PDF
GTID:2218330368489885Subject:Systems Engineering
Abstract/Summary:PDF Full Text Request
In recent years, blog as social networking applications maintain the momentum of rapid development. Following Email, BBS and ICQ, blog become the fourth network communication. With the Increasingly important role on relationship establishing, relationship maintenance and relationship development, blog has been integrated into the daily lives of people. With the continued strengthening of blog influence, the by-products, plog, is also emerging. The existence of a large number of splogs has seriously affected continuing to use the blog. Splog is not only a waste of storage resources and network bandwidth, but also is an important factor of influencing ranking of search results. The existence of a large number of splogs has seriously affected the accuracy of information retrieval, which makes the user's experience worse and worse. So, How to precisely identify splog has become one of the challenges of information retrieval. The identifing splog is necessary to filter the splogs for further analysis and retrieval.In this paper, we proposed a method of multiple structure features based on the existing feature extraction based on content of splogs. This paper analyses the structure feature of splog based on homepage and post via splog intention analysis. We get the address of blog from search engine results and establish splog identification data set more realistic and targeted. We proposed the splog identification model of multiple structure features based on Naive bayes and SVM. In experiment, we set parameters via using training data set and detect the identification model via using test data set.The main contents of the paper include the following:1. Based on the existing research, we analysis the multiple structure features of splog via splog intention analysis and propose feature extraction algorithm.2. We build the blog collection system. Firstly, we get the address of blog from search engine results. Secondly, we collect the blog data set from the address. Thirdly, we process the data and distinguish artificially between blog and splog according to the definition of splog.3. We proposed the splog identifing method based on multiple structure feature. We built the model of splog identifing model combing the method with Naive bayes and SVM. We trained the model via using training data set and detected the identification model via using test data set. We compared the experimental results of our method with the method based on content and analysis the results.
Keywords/Search Tags:Splog, Multiple structure features, Feature Extraction, Naive bayes, Support vector machine
PDF Full Text Request
Related items