Font Size: a A A

The Research Of Web Authority Nodes Mining To Restrain The Malicious Web Page

Posted on:2009-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:J F LuoFull Text:PDF
GTID:2178360278957054Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
As the large number and complex structure of the web resource, it is difficult for us to manage the web pages orderly. There are more and more malicious web pages mixed in the web resource. More seriously, as the limitations of the search engine, malicious pages are always returned as authority resource nodes though some illegal ways. Many anti-virus tools have been used to restrain the malicious pages by preventing the running of malicious codes which hide in the pages, or give a safety warning when the user prepare to open it. Those methods make the anti-virus task totally depend on the anti-virus software, or some content-identifying technique. It doesn't work well. Then, some new methods have been used from the view of linkage analysis. As long as the malicious content is identified, it is common to simply filter out the malicious pages and its linkage. They don't distinguish the linkage to malicious pages from others during the page's rank.In this paper, we mainly discuss the link-based authority web pages mining under the environment of malicious pages. After the introduction of the status quo of the graph mining and its general theory, and based on some reasonable assumptions, this paper mainly researches on the impact of the malicious web pages on user's surfing action and present a new surfing action model. The new model take the prior information of malicious pages into account and, more importantly, convert the problem of the authority web pages mining into the solution of a Markov chain's steady-state distribution. Under the new surfing model, we put forward a new page rank algorithm with negative link weight penalty to restrain the linkage to malicious pages, in which the web pages which link to malicious pages are punished. Subsidiary nodes are introduced to ensure the correctness and effectiveness of the algorithm under different conditions.Both theoretic analysis and simulation result show authority values of the nodes linking to malicious ones will be reduced, and the more linkages and linkage weight value are, the more authority value will be reduced. But page's authority of those pages without links to malicious page will be increased. It effectively restrains the linkage to malicious nodes from the perspective of link analysis. All the simulation dates is generated by a web graph model, which is credible.Finally, this paper gives some improved algorithm and takes a statement of conditions under which the algorithm can be used.
Keywords/Search Tags:malicious web pages, Markov chain, negative link weight penalty
PDF Full Text Request
Related items