Design And Implementation Of Website Text Data Acquisition System

Posted on:2016-10-14

Degree:Master

Type:Thesis

Country:China

Candidate:D Tian

Full Text:PDF

GTID:2298330470955549

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

The Internet public opinion monitoring system is a product of the development of new media. It can monitor the spread of network information in real time. The monitoring of the public opinion enables the users to discover, know about, and to track the public opinion in the first place. Therefore, it makes prevention of crime possible. The web crawler, as a part of the public opinion monitoring, defines the real-time function of the monitoring. This thesis designs and realizes the data acquisition system. It customizes and crawls the contents of the targeted sites through configuring website template. Then in this way it could provide real-time data sources.The data acquisition system designed by this thesis crawls the contents of the targeted sites majorly through the resource allocation of the crawler as well as the two subsystems, that is, the monitoring platform and the information crawling platform. The resource allocation of the crawler and the monitoring platform employs JavaEE open-source development frameworks such as Struts2, Spring and some others. The platform utilizes the hierarchical structure and modular design of the system, and therefore, successfully increases its productivity and extendibility. The information crawling platform makes references from the framework of the Heritrix of SourceForge open-source crawler. It has been redesigned and redeveloped so as to adapt to the demands of the its own products. The duty of the resource allocation of the crawler and the monitoring platform is to allocate the crawled information, which includes sites, network channels, seeds, templates and some other configuration information. Moreover, the platform can also test and verify the accuracy of the configured templates. At the same time, the platform provides a dynamic diagram of the crawled information which makes the usersâ€™monitoring of the amount of the crawled information much more convenient. Whatâ€™s more, it can export the records of inaccurate templates and correct them. The information-crawling platform major concerns the crawling of the website information. It can crawl the contents of a webpage through four steps, namely, seed loading, webpage loading, webpage parsing and data storage. In the process of system designing and developing, the author here completed the five tasks listed as follows:(1)To gather the usersâ€™demand and to investigate the current conditions of the crawler industry, therefore, to figure out the overall demands of this system as well as the functional demands of each template. (2)To design the overall structure of the system and to divide the functional modules.(3)To figure out the solution of the functions of each module according to the division of functional modules. Besides, the author has completed the design of those modules including the information configuration management, template testing, crawling records, acquisition of crawled seeds, HTML loading, template parsing, data enqueue, etc.(4)To programme the functional modules according to a concrete plan.(5)To test the functions of those modules which bear great importance, and to check the accuracy of acquisition.This system, as a test version, can satisfy the basis needs of users. Nevertheless, it is still not a competitive product of this industry. In the future, we need to improve on the configuration of moduleâ€™s automation and the efficiency of crawlerâ€™s acquisition of information. In this way, we could make it competitive and bring considerable profits to the company.

Keywords/Search Tags:

Public Opinion Monitoring, The crawler, JavaEE

PDF Full Text Request

Related items

1	Design And Development Of Web Game Opinion Monitoring System Based On Web Crawler
2	Research On The Key Technology Of Public Opinion Monitoring On The Web
3	Network Public Opinion Monitoring System For Agriculture Products Based On Big Data
4	The Design And Implement Of Cmcc Public Opinion Monitoring System
5	Design And Implementation Of COVID-19 Network Public Opinion Monitoring System Based On Deep Learning
6	Research And Development Of Internet Public Opinion Monitoring Model Based On Web Crawler And Lucene Index
7	Design And Realization Of An Internet Public Opinion Monitoring System
8	Design And Implementation Of Internet Public Opinion Information Management
9	Design And Implementation Of Network Public Opinion Monitoring System Based On BS Model
10	Design And Implementation Of Network Media Public Opinion Detection And Analysis System