Font Size: a A A

The Research And Realization Base On The Discovery And Analysis Of Crawlers Domain

Posted on:2017-11-14Degree:MasterType:Thesis
Country:ChinaCandidate:T SunFull Text:PDF
GTID:2348330515465272Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of today's Internet,the Chinese Internet resources share has increased year by year.By the end of 2015,the total number of Chinese top-level domain has reached 31.02 million.With the rapid growth of the Internet resources,a large number of illegal sites have begun to spread.How to ensure network and information security as well as clean network content is particularly important for Internet content administrators in China and all over the world.The core solution to solve network information security is to firstly find the problem sites,and then locate the assess point and shut down the site or make other corrective treatments.Currenly there are a variety of domain name discovery ways,Among these ways,the web content-based web crawler framework,which visits the sites by simulating artifical visit,has become one of the most effective solutions to safeguard internet information security.This subject has researched on distributed web crawling technology and designed to complete the “domain detection system”.The system can crawl deeply to a specify site based on the configured searching strategies..It can gain site domain name while make DNS cloud resolution to the domain name in order to obtain its IP address access and achieve the websites' discovery and location functions.This can provide a suggestive technological implementation plan for Internet service providers' jurisdiction domain name management.Through research and realization of the subject,the thesis has furtherly proved that Web crawler technology has excellent features of quickly finding the sites,web pages analysis,site access location and low operating costs.It has played an important role in stability maintenance work of network.
Keywords/Search Tags:web crawler, search strategies, distributed crawling, DNS cloud resolution, IP attribution
PDF Full Text Request
Related items