Font Size: a A A

Research On Network Crawling Method Of Spatial Data

Posted on:2020-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:D RanFull Text:PDF
GTID:2428330572974030Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the development of the Internet,the spatial data of the Internet platform is exploding.From vector to raster data to time-space data,there is hidden practical information behind the network space data.It is an important source of data in the era of large data in space,so the network of spatial data Acquisition is one of the key links.The spatial data is generally stored in the database on the network server side.The front end uses webpage technology to display its spatial form.The paper analyzes the webpage structure and efficiently and reliably crawls the spatial data from the background database.The paper mainly realizes two kinds of data crawling of vector and grid,vector data selects POI data and traffic situation data,and raster data selects image data.It is mainly realized by four methods.It realizes urban-wide POI data crawling through simulation search method,realizes circular area POI data crawling through circular sectioning,and realizes rectangular area POI data and traffic situation data crawling by square partitioning.Login to achieve image data crawling.In order to ensure the smooth climb of spatial data,the paper uses proxy IP,camouflage browser,anti-theft image link and reduce access frequency to achieve anti-climbing measures of spatial data.The paper improves the efficiency of spatial data crawling through multi-process and multi-threading,and completes data cleaning and data de-duplication in the data crawling and warehousing process through programming means and database operations.Through experimental argumentation analysis,four methods can be used to achieve POI data crawling in urban areas,rectangular areas,and circular areas,traffic situation data crawling in rectangular areas,and image data crawling of specified types.And the use of multiple processes makes the spatial data crawling efficiency three times higher.Through data cleaning and deduplication,the accuracy of the crawled data is greatly improved,and the irrelevant data and the duplicate data record approach zero in the single data crawling process.The paper uses Python language,pyqt platform and eric6 environment to program four spatial data crawling methods.Through software testing,these four spatial data crawling methods are highly practical and can be applied to large-scale POI data,traffic situation data and Remote sensing image data crawling.Finally,based on the data obtained,the paper analyzes the traffic congestion in the rectangular area of Chongqing Jiaotong University to the Yangtze River Bridge and the influencing factors of SO2 pollution concentration in Beijing.According to the results of congestion analysis,the travel plan can be formulated and the traffic command can be referenced.It can guide the environmental governance and management.According to the experiment and application,the spatial data crawled by the four methods can be used for different kinds of spatial data analysis,and the data used for crawling is of high value.
Keywords/Search Tags:spatial data, Web crawling, python, multiprocess, multithreading
PDF Full Text Request
Related items