Research Of Identifying Splog Based On Multiple Structure Features

Posted on:2012-07-17

Degree:Master

Type:Thesis

Country:China

Candidate:Y He

Full Text:PDF

GTID:2218330368489885

Subject:Systems Engineering

Abstract/Summary:

PDF Full Text Request

In recent years, blog as social networking applications maintain the momentum of rapid development. Following Email, BBS and ICQ, blog become the fourth network communication. With the Increasingly important role on relationship establishing, relationship maintenance and relationship development, blog has been integrated into the daily lives of people. With the continued strengthening of blog influence, the by-products, plog, is also emerging. The existence of a large number of splogs has seriously affected continuing to use the blog. Splog is not only a waste of storage resources and network bandwidth, but also is an important factor of influencing ranking of search results. The existence of a large number of splogs has seriously affected the accuracy of information retrieval, which makes the user's experience worse and worse. So, How to precisely identify splog has become one of the challenges of information retrieval. The identifing splog is necessary to filter the splogs for further analysis and retrieval.In this paper, we proposed a method of multiple structure features based on the existing feature extraction based on content of splogs. This paper analyses the structure feature of splog based on homepage and post via splog intention analysis. We get the address of blog from search engine results and establish splog identification data set more realistic and targeted. We proposed the splog identification model of multiple structure features based on Naive bayes and SVM. In experiment, we set parameters via using training data set and detect the identification model via using test data set.The main contents of the paper include the following:1. Based on the existing research, we analysis the multiple structure features of splog via splog intention analysis and propose feature extraction algorithm.2. We build the blog collection system. Firstly, we get the address of blog from search engine results. Secondly, we collect the blog data set from the address. Thirdly, we process the data and distinguish artificially between blog and splog according to the definition of splog.3. We proposed the splog identifing method based on multiple structure feature. We built the model of splog identifing model combing the method with Naive bayes and SVM. We trained the model via using training data set and detected the identification model via using test data set. We compared the experimental results of our method with the method based on content and analysis the results.

Keywords/Search Tags:

Splog, Multiple structure features, Feature Extraction, Naive bayes, Support vector machine

PDF Full Text Request

Related items

1	Decoding Emotion From FMRI Based On Machine Learning
2	Research On Network Traffic Classification Based On Machine Learning
3	Design And Implementation Of The Email Spam Detection System Based On Naive Bayes And Svm
4	Design And Implementation Of The Email Spam Detection System Based On Naive Bayes And SVM
5	Research On Patent Value Classification Prediction Model Based On Machine Learning
6	Feature Extraction And Recognition Of Inland Waterway Vessels Based On Machine Vision
7	Research Of Splog Detection And Its Relative Technologies
8	Completing News Classification By Related Machine Learning Algorithms
9	Researches On Some Problems In Nonparallel Hyperplanes Support Vector Machine And Feature Extraction
10	Research On Chinese Semantic Keyword Extraction Method Based On Multiple Features