Font Size: a A A

Research On Search Engine’s Anti-spam Technologies Based On Link Analysis

Posted on:2013-09-29Degree:MasterType:Thesis
Country:ChinaCandidate:C ChenFull Text:PDF
GTID:2248330374476312Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development and popularization of internet, online activities arebecoming more and more active. Data illustrate that search engine has become theentrance of the internet and one of the main sources from where people getinformation.Search Engine analyzes the queries, compares them with indexed web pages andextracts those pages with high relativity. Normally, billions of pages have beenindexed and thousands of pages are returned by the search engine, however only thetop10or20pages are browsed by users. Thus, how to rank pages becomes a criticaltask of search engine. As to website owners, high rank of their sites can bring greatinterest. Since it is difficult and costs a lot to maintain a high quality website, theytend to get high rank by cheating the search engine. Thus, all kinds of web spam haveemerged. So far, web spam is generally grouped into two categories: content-basedspam and link-based spam, and link-based spam have become the most popular in theinternet. Web spam not only increases the cost of search engine, it also affectseffectiveness of information retrieval. Thus, research on anti spam of search engine isof great significance for both search engine and users.Based on existing link based anti spam techniques, this paper proposed thatthrough the analysis of rank value series of web pages, we can extract the propertiesof web spam and then use them to counteract web spam. The main contents of thispaper are illustrated as below:1. Firstly, it introduces the principle of search engine, the math model of internetand gives detailed analysis to two of the most popular link based algorithms:PageRank and HITS. Secondly, it analyzes the most popular link based spam model:link alliances. Lastly, based on analysis of all kinds of spam and anti spam techniques,it proposes that through the analysis of rank value series of web pages, along withexisting anti spam techniques, we can identify some of the web spam. Also, anexperiment has been carried out and it verifies the effectivity and practicality of this method.2. Design and implement an experiment to extract abnormal domain rank valueseries. The experiment is based on datasets collected from the whole internet. With theproperties of link alliances, the experiment analyzes the domain rank value series ofthe web pages. Also, techniques used in mass data processing are introduced. Throughthe comparison of result pages extracted based on different statistic properties, adetailed analysis has been made and it verifies the validity and effectivity of theanalysis of rank value series. At the same time, expansibility of the experiment hasbeen considered and work need to be done in the future has been mentioned.
Keywords/Search Tags:search engine, web spam, link-based spam, series analysis
PDF Full Text Request
Related items