Font Size: a A A

Design And Implementation Of Webpage Tampering Monitoring System

Posted on:2019-01-08Degree:MasterType:Thesis
Country:ChinaCandidate:J LiuFull Text:PDF
GTID:2348330542998638Subject:Software engineering
Abstract/Summary:PDF Full Text Request
At present,many government departments,enterprises,institutions and universities have their own website.However,because most of the government websites lack the support of local tamper-proof system,webpages in the government websites are at risk of being tampered with at any time.Webpage tampering can be discovered after a long time,the page tampering persisted on the site for the longer,the site owners would lose more.Therefore,the page tamper detection has become an important part of the construction of government website security.There are many government departments,enterprises,institutions and universities in big cities,so the number of government websites is very large too.In order to carry out a monitoring,statistics and analysis on the website tampering of the government website on a large scale,this paper designs and implements a highly effective and practical website tampering monitoring system.In this paper,the research work of the system focuses on webpage collection and webpage comparison.In the web page collection,this paper designs and implements a small crawler that conforms to the actual need of the system,and optimizes the performance of the reptile.Webpage collection separate the link collection procedure and the web download procedure,which can reduce network crawler run times,and improve system efficiency.This paper presents a method of using the link tree to calculate the link weight of a webpage.After sorting the page links according to their weight,the web page can be screened and a shorter download cycle of more important pages.In terms of webpage contrast,this article uses a method of traversing the DOM tree of the page to locate its structure changes,style changes,and content changes.The change of the structure and the multi-level nesting of the tag elements in the webpage will cause problems to the traversal of the DOM tree.Therefore,this paper classifies the page label elements,defines the structural elements and content elements of the web pages,then propose a DOM tree twice traversal method,successfully solved the above two issues.
Keywords/Search Tags:web crawler, DOM tree, webpage change, webpage tamper
PDF Full Text Request
Related items