Font Size: a A A

Design And Implementation Of Stock Board Crawler Based On IP Agent Pool

Posted on:2020-12-21Degree:MasterType:Thesis
Country:ChinaCandidate:Z H SunFull Text:PDF
GTID:2428330575487084Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Technology is the foundation of national prosperity.In recent years,the state attaches great importance to the "Technological Innovation",and Internet technology has been further applied and promoted.Similarly,in the field of financial stocks,the reptilian technology in the "Internet +" era has also successfully empowered it to provide more convenient information and customized investment strategies for investors.This paper describes the stock board crawler based on IP proxy pool,aiming at solving problems that the shareholders can not perceive the real-time changes of different stock plates in time,and that the traditional crawler can not automatically cross the anti-crawling mechanism to serve the the crawling of stock data,so that the crawler can be better applied to the data exploring scenario in the stock plate,and further improve the crawler's capture efficiency.This paper mainly deals with the following aspects:1.Research on the key technology of stock crawler based on IP proxy pool,mainly including: 1)deploying jar packages based on Maven sub-module,simulate microservice architecture mode,exposing IP proxy pool interface with minimum construction cost,and realizing project compilation flexibility;2)using Dubbo and Zookeeper distributed RPC framework to realize remote service,realizing registration and invocation between production and consumption sides of the interface,and using its long-Link communication characteristics to reduce the network overhead of proxy IP;and 3)integrating Quartz task scheduling service,using Scheduler,Trigger,Job and other core classes to design and develop plate monitoring crawler timing task and proxy IP crawler timing task.2.Designing and realizing the stock plate timing monitoring crawler subsystem.Analyzing the Web requests of the conceptual plate website and the industry plate Website.Aiming at the irregular search ID of the Website,making use of the uniqueness of the stock code,making strategic leaps,and efficiently encapsulating the returned data.At the same time,integrating the Quartz task dispatcher to capture and update the stock plate data in time to ensure the monitoring number.The timeliness of the evidence.3.Design and implement of the interface subsystem of IP proxy pool.Using Quartz task scheduler to capture the proxy IP of IP proxy website regularly,and according to the calculation of proxy IP activity and the frequency of crawler's request interface,designing the service integration strategy of IP proxy pool's multi-threaded capture IP,IP speed measurement,IP de-duplication,Redis cache,MySQL database and Quartz timer in order to improve the high-level availability,data consistency and low latency of the interface,and therefore providing reliable support for the monitoring interface calls of the crawler.Through the development of the core modules mentioned above,the IP agent pool subsystem is integrated into the Spring MVC framework.The IP agent pool subsystem is cascaded with the stock plate crawler subsystem,which realizes the timely,efficient and accurate presentation of the plate data,facilitates the investors and investment enthusiasts,and provides some ideas for the strategy anti-crawling mechanism.
Keywords/Search Tags:IP Agent Pool, Stock Board, Quartz, Dubbo, Zookeeper
PDF Full Text Request
Related items