Font Size: a A A

Application Of The Node Importance On Web Spam Detection In Complex Networks

Posted on:2019-07-12Degree:MasterType:Thesis
Country:ChinaCandidate:J GuoFull Text:PDF
GTID:2370330593451049Subject:Computer Technology and Engineering
Abstract/Summary:PDF Full Text Request
In modern society,people always get information through the Internet.But many unscrupulous commercial websites are starting to make a profit by making web spam.The behavior has seriously disturbed the normal order of the network.Therefore,the detection of web spam has become a major problem that should be solved.TrustRank believes that good web pages do not link to web spam.But it exists in real life.The paper improves the TrustRank and uses the sorting method of node importance in complex networks.It proposes a web spam detection algorithm which is based on betweenness centrality and clustering coefficient algorithm.It includes two sub-algorithms: BCW and CTRank.The main work of this paper is described as follows:(1)Web spam makers add outlinks in order to improve their ranking in the web page.The paper proposed a new method to select seed set,which is BCW algorithm.First,PCA algorithm was used to process the data,then the paper used betweenness centrality to score web page and defined different weights for web pages.In this way,the importance score of each web page is represented by weighted summation of score of this web page and outlinks.Then,the seed set assignment method is used to select the higher and the lower score pages and give different initial values respectively.This is done to allow them to form a seed set together.(2)TrustRank ranking algorithm considers that transfer probability between two web pages is the same.But the relationship between web pages and web pages should not be equal.The paper proposed the CTRank algorithm.The number of inlinks to substitute the number of neighbors in clustering coefficient.According to the clustering coefficient of each web page and the number of outlinks,the paper used different methods to improve the transfer matrix differently.It was used to improve the problem of ignoring the importance of web pages in TrustRank algorithm.The paper used WEBSPAM-UK2007 to evaluate the effectiveness of the algorithm.The ranking results are analyzed with precision,recall and F value.Experimental results demonstrate the effectiveness of the proposed algorithm.
Keywords/Search Tags:Ranking Algorithm, TrustRank, Betweenness Centrality, PCA, Clustering Coefficient
PDF Full Text Request
Related items