Font Size: a A A

A Web Crawler Supporting AJAX

Posted on:2008-12-04Degree:MasterType:Thesis
Country:ChinaCandidate:B LuoFull Text:PDF
GTID:2178360212984907Subject:Computer applications
Abstract/Summary:PDF Full Text Request
Web Crawler is an important component of Search Engine, web developers build applications that are easier to use and more functional than traditional Web programs by using AJAX technologies, which create web pages with Asynchronous JavaScript and XML. AJAX changes the content of web pages dynamically after getting the data from web server by sending the request asynchronously. As a result, the data that the traditional web crawler collects is less than the data presenting in the web browser. We propose a new web crawler - AjaxCrawler, which supports AJAX.The AjaxCrawler is composed of crawling web page, analyzing web page, interpreting JavaScript, invoking DOM operation methods, regenerating web page. First, crawl the web page by HTTP request, second, analyze the page element, not only the links, but also the JavaScript code and file in the page, then, execute the JavaScript code, which include the AJAX request, gets the result from server and invoking DOM operation methods to change the content of web page, at last, regenerate the web page and extract the links.According to the experiment, the content crawled by AjaxCrawler is more than traditional crawler at the same condition.
Keywords/Search Tags:Search Engine, Web Crawler, AJAX, Web2.0
PDF Full Text Request
Related items