Font Size: a A A

The Design And Implementation Of Directional Website Crawler And Related Web Services

Posted on:2015-10-10Degree:MasterType:Thesis
Country:ChinaCandidate:Z D ZhaoFull Text:PDF
GTID:2308330461456664Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Currently, with the gradual popularization of smart phones, more and more mobile applications come into available. With adequate and accurate data, Some of them become popular. And some other applications become popular for its revolutionary interaction mode. However, small companies who want to develop their own mobile applications do not possess adequate and accurate data. On the other hand, the lack of thoughts on user experience and interaction mode also restricted their development.For data acquisition, companies can choose to cooperate with specific data provider or just do it themselves. This paper mainly describes a mobile application’s data acquisition scheme based on web crawler. The main contents of this paper include:● Data acquisition crawler based on Python. It mainly introduces the overall design, concurrent architecture and detail implementation of the crawler based on Python. The key features of the crawler include:dynamic proxy support, asynchronous based on queue, two-layered crawler and so on.● The design and implementation of back-end server which relys on the crawler. It mainly introduces the overall architecture and detail implementation of each service of server which is built on top of Apache CXF. Besides, the cache system used to improve performance and details of interaction between server and crawler are also included.The scheme provided by this paper is low-cost, easy-to-implement and easy-to-understand. It can be well used by medium and small-sized enterprises. The application described in this paper performs well both on Android and IOS platform. Besides, the performance can be improved as we fully cnsidered extensibility in the design.
Keywords/Search Tags:Crawler, Python, Dynamic proxy, Asynchronous queue, Concurrent
PDF Full Text Request
Related items