Font Size: a A A

Research On Crawling Model And Stratage Which Is Available Of Crawling Cloud-Computing Products’Data From Rias

Posted on:2015-08-27Degree:MasterType:Thesis
Country:ChinaCandidate:S WangFull Text:PDF
GTID:2298330467976648Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
There is an opportunity for small businesses and common users to reduce their operation and application costs, together with a challenge of dealing with large amount of products information offered by recently flourishing industry of cloud computing, while more and more products appear in the market. The challenge is raised by the highly customizable character of cloud computing products and the isolation of products information scattered on web servers of different vendors. The huge amount of products information along with the difficulty of processing it result in the inability of decision making support. When it comes to the age of Web1.0, we build search engines to cope with the problem of abstracting useful information from infinite data by grabbing info from the Web with a scraping system (i.e. Web crawler). But things are quite complicated while Web2.0accompanied by the technologies of recently developed, like Ajax and j Query, comes to reality. The newly developed technologies turn the traditional web pages into more sophisticated RIAs (Rich Internet Applications), from which grabbing data with traditional web crawlers became impossible allow for technique costs. Almost all cloud products information exists in RIAs, so there is an urge demand for designing a scraping system which is appropriate for grabbing data from RIAs to arm the decision making process of applying cloud computing.This paper studies the traditional web crawler together with researches in the field of Ajax friendly web2.0crawlers, and comes up with a solution for a system which is able to crawl customizable products information from RIAs. The system is designed upon the existent general scrap framework of Scrapy, and powered by a series improvements in crawling rules processing, crawling algorithm, scheduling scheme, scripts execution, DOM modification and user events invocation. Other than traditional crawlers and Ajax friendly crawlers of existent, this system is neither a general crawler, nor a specialized crawler which works on specific web site, it is a crawler suitable for crawling a series of RIAs in which the products’ contents hold the character of customizable. The main difference of this system to traditional crawlers is that it is possible for this system to grasp information hidden underneath the surface web of RIAs by executing scripts and alternating DOMs. The main difference of this system to Ajax friendly crawlers is that it is possible for this system to simulate user operations in trigging user events in RIAs by invoking a different web page model and its polling methods.The research of crawling cloud computing products’information from RIAs by this paper, benefits the decision making process of applying cloud computing technology by support the procedure with more detailed information obtained from the deep web. The studies in this paper also have some significance to the study of products with the character of supporting customization.
Keywords/Search Tags:RIA, Cloud computing products, data crawling
PDF Full Text Request
Related items