Font Size: a A A

Ecological Scientific Investigation Data System With Anti-crawler Mechanism

Posted on:2022-05-31Degree:MasterType:Thesis
Country:ChinaCandidate:Z H WeiFull Text:PDF
GTID:2518306482473284Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Ecological data is helpful for the government and relevant scholars to reveal and predict the changes of ecological environment.However,the local ecological departments and ecological scientific research data are scattered and poorly shared,resulting in high time costs for data search.As data continues to increase,data maintenance and management have become difficult,and some data loss problems often occur.The continuous development of web crawler technology has threatened the stability of the system and the security of data.It can obtain data from Internet with low cost and large range,which brings uncertainty to the security of data.At present,the traffic of web crawlers on the Internet has reached the highest level in history,accounting for about 37.2% of the total traffic.Therefore,restricting and intercepting web crawlers based on effective mechanisms has become an important issue that the system needs to consider.The main work of this paper is as follows:(1)Propose and implement an ecological scientific investigation data system based on microservice architecture.Collect,manage,and share these data by cooperating with local authorities.The system collects,manages,and shares these data by cooperating with local relevant departments.It actively builds a data sharing service model supplemented by online sharing and offline sharing,so that the effective value of data can be brought into play.(2)Aiming at the shortcomings of current traditional anti-crawler mechanisms that are easy to be cracked and become invalid,a crawler identification method based on browser fingerprint technology is proposed.This method fully detects the environmental changes of the user's Web browser,and the detection items are not easy to be forged.(3)This paper simulates the crawler to visit the current system,extracts features from the collected request information and behavior information,and proposes a crawler identification method based on the Naive Bayesian classification model.At the same time,random forest algorithm is used to select the important features of crawlers,which improves the ability of Naive Bayesian classification model to recognize crawler.With the increasing number of system visits,the classification model can be trained and optimized by collecting more characteristic data.(4)Because the recognition accuracy of the reptiles can not reach 100%,some of them are still unrecognized.Therefore,the anti-crawler processing for Chinese characters and numbers is realized,and the key information displayed on the html page is prevented from being easily obtained by the crawler.(5)The hybrid anti-crawler mechanism is established by the traditional crawler recognition mechanism such as browser fingerprint and the crawler recognition model based on Naive Bayesian classification.It is applied in the ecological scientific research data system,which improves the ability of the system to deal with crawlers and security of data.
Keywords/Search Tags:Data Sharing, Anti-crawler, Browser Fingerprint, Naive Bayesian Classifier, Random Forest
PDF Full Text Request
Related items