Web Crawler System Based On Chrome Extension

Posted on:2017-04-18

Degree:Master

Type:Thesis

Country:China

Candidate:S P Wei

Full Text:PDF

GTID:2308330503953780

Subject:Software engineering

Abstract/Summary:

With the advent of the era of big data, network information has been growing in a explosive speed, for instance, the daily blogs posted on Sina Weibo reach to 120 million. However, the difficulty for users to get the data they need has been increasing in the context of unprecedentedly rich information. Scattered result presented by traditional search engine, like Baidu and Google, has no longer fulfilled usersâ€™ requirements, what they need more is the data of valid integration in professional data analysis and daily life. Crawler is one of technologies used in the process of internet data integration. However, common crawler technology used currently has difficult exploitation and poor stability, and is not user friendly, which can not meet the needs of user. Therefore, it is valuable to develop a new crawler system with the feature of simple expanding development, high stability, wide application and user friendly.This article firstly analyzed current crawler system, crawler technology and anti-crawler strategies used both in China and abroad, as well as the reason leading to the complex implementation process of web crawler system, poor stability and user unfriendly, as a result, a new crawler system based on Chrome was created. Furthermore, in order to fulfill different user needs and give play to advantages of internet, two kinds of information capture modules in web crawler system based on Chrome expansion were proposed, which were personal version information capture module extension and server version information capture module extension. Finally, in order to support the high concurrency requirements of central server module for personal version information capture module, central server module based on Netty framework and database module using Master-slave Database configuration, and in order to make central server module extend better as more requests are made, this article used program to interface and introduced spring framework to manage the dependencies between the central server module and category.The crawler system designed and developed in the article has the features that easy development, extended to facilitate and supporting many webpage types, including static webpage, asynchronous loading webpage and dynamic webpage, and personal version information capture module can also maximize the advantage of internet to grab information using each web crawler user. As the result presented in the test environment, all the features presented above has been successfully implemented and it performs much better than other current crawler system in the field of user friendly and capacity.

Keywords/Search Tags:

Crawler, Chrome Extention, JavaScript, Netty, Master-slave Database

Related items

1	Application Research Of Web Crawler Based On Chrome Headless In Web Vulnerability Scanning
2	Design And Implementation Of Semi-automated Test System In Master-Slave Database
3	Implementation Of Multi-master Replication Database Extension Based On MySQL Replication
4	Design Of Master Manipulator Of Master-slave Telecontrol Robot Based On The Internet
5	Development And Maneuverability Of Modular Master-slave Robot Teleoperation System
6	Research On Master-slave Tele-robotics System Based On Virtual Reality
7	Research Of Master-slave Synchronization Optimization Methods On DDL Operations For MySQL Cluster
8	Research On A Master-slave Interactive Control System Of A Hydraulic Manipulator
9	Research On The Precision Of Universal Surgical Master Hand With Force Sense
10	Design And Mechanical Analysis Of A Master–slave Manipulator