Font Size: a A A

Design And Implementation Of The Particular Website Proactively Identify And Verification System

Posted on:2016-08-02Degree:MasterType:Thesis
Country:ChinaCandidate:D H LiFull Text:PDF
GTID:2308330479991451Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Nowadays, with network technology constantly updated and network services widespread, the Internet users rely on strengthening of network. The updated network technology also leads to the particular websites appear constantly and it makes economic property from loss of Internet users. Particular websites are pointed to contain a threat to social or personal stability, security of information website, particular website mostly distributed in the abroad and it has the characteristics of fast growth and route of transmission. For these specific website information with passive detection is not enough. This paper proposes a detection and authentication system finding particular website which based on the technology of active finding.Aiming at the problem of the coverage and accuracy, this paper using search technology, vertical search tracking technology proposed a way based on a user’s white list and keywords to find the particular websites actively. System let the keyword expanded by using Breadth-first search. System find the suspicious particular website URL by using the depth first search.With the suspicious URL finding by active detection technology, system uses the comparison verification technology based on the web page Title and web page structure. The system extract webpage feature by extracting webpage Title and structure. Page similarity comparison using the keywords by webpage Title cutting. System generate the DOM tree through the page structure, then extract the VTree using the DOM node selection algorithm. Calculate the final result by the page structure comparison algorithm finally.Tests show that the module can run normally and can reach the corresponding indicators. System found 883 suspicious particular websites on an average day, article 57 particular websites on average, The rate of false positives and nonresponse are less than 15%.
Keywords/Search Tags:proactively identify, webpage structure, node selection, webpage compare
PDF Full Text Request
Related items