Font Size: a A A

Research And Implementation On Theme Web Crawler Of Supporting Ajax

Posted on:2012-11-14Degree:MasterType:Thesis
Country:ChinaCandidate:J WangFull Text:PDF
GTID:2178330335950584Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Web crawler which fetching documents from the website is an important part of ssearch engineer. With the development of network technology,a growing number of websites turn to Ajax. Ajax asynchronously sends requests to server, and dynamically updates the web pages after related data are fetched without changing the source files. When using Ajax, the traditional web crawler that uses HTML files to craw information cannot fetch dynamic information. Ajax has been applied to wider fields, for instance: news sites adopt Ajax to realize the dynamic news commentary that is of great importance to information collection. Based on the Ajax application in news, this paper studies and designs a topic web crawler system that supports Ajax technology to fetch news and dynamic commentary from news websites.First, the dynamic information collection of Ajax web pages is studied. With browser API, we model customer behavior to manipulate web pages and collect the dynamic information of Ajax web pages. Because of the structural similarity of the same type Ajax web pages with in a web site, pre-processing phase is adopted before collecting the topic information. During pre-processing phase, the effective trigger elements are found from Ajax web pages and these trigger elements are converted to protocols and classified to provide protocols for collecting dynamic information.Second, the collection of topic information is realized. Using the semantic characters of URL, our proposed system distinguishes topic information; combing the protocol driver and event driver, our proposed system fetches the news and dynamic commentaries.Finally, The experimental results show that this method to acquisition news site theme information is effective.
Keywords/Search Tags:web crawler, Ajax, Topic crawler
PDF Full Text Request
Related items