Design And Implementation Of Distributed Crawler Project Based On Biomedical Literature Data

Posted on:2018-01-06

Degree:Master

Type:Thesis

Country:China

Candidate:Y Gao

Full Text:PDF

GTID:2348330518479488

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of the Internet,the massive data mining and application will 1 ead a new trend.The results of the International Data Corporation showed that the amount of data generated in the world is as high as 1.82 ZB in 2011.At the same time,the field of life science data is also growing rapidly,especially as rapid promotion of gene sequencing technol ogy,and protein sequencing technology there accumulat a large number of biomedical data.Mea nwhile drug design,drug screening and clinical trials are also the sources of the massive data,These human health data in the field of life science have reached surprising amount.Howeve r,the medical researchers and medical workers have the defects using the medical literature,an d can not useage the maximum effect of the literature.In this paper,the basic principles of web crawler,classification and analysis algorithm of web crawler are studied.For the anti crawler,distributed crawler frame Scrapy and dynamic we b crawling technology is introduced,Based on the studies,the authors put forward a distributed Scrapy-Redis-Selenium+PhantomJS crawler framework to implement the PubMeb web crawler sy stem.The system mainly extract the title and abstract of the related subject literature.In favor of the user,to the system use the Qt framework to designthe UI interface of the crawler syste m.Finally,this paper summarizes the work and puts forward the direction of further optimizati on.In a word,this paper mainly focuses on the design and implementation of distributed craw ler based on biomedical data.The system solved the problem of the support for dynamic web pages,in addition,the speed of information collection is also improved.So,it provides the t echnical means for the distributed crawler of PubMeb web page,and can obtain the relevant medical literature data more efficiently.

Keywords/Search Tags:

biomedicine, PubMeb, Scrapy-Redis, crawler, distributed

PDF Full Text Request

Related items

1	Design And Implementation Of Distributed Web Crawler System Based On Scrapy
2	Design And Development Of Distributed Crawler Based On Scrapy Framework
3	Research And Application Of Efficient Data Acquisition Methods For Domain Data
4	Design And Implementation Of Search System Based On Scrapy-redis And GMM
5	Design And Implementation Of A Distributed Crawler System Based On Scrapy Framework
6	Design And Implementation Of Distributed Books Web Crawler System
7	Design And Implementation Of Web Crawler System Based On Scrapy Framework
8	Design And Implementation Of Distributed Crawler System Based On Docker Cluster
9	Design And Implementation Of Distributed Online Book Crawler System
10	Scrapy Framework-based Web Crawler Achieved Data Capture And Analysis