Font Size: a A A

Design And Implementation Of Ranking List System Based On Web Crawler

Posted on:2019-02-27Degree:MasterType:Thesis
Country:ChinaCandidate:Q W LiuFull Text:PDF
GTID:2348330542498705Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the rapid growth of the information technology,the quantity of information on the Internet has exploded massively.How to extract and make full use of valuable information from massive data has.become a serious challenge.The emergence of web crawler effectively solves this problem.With the help of powerful processing ability of the computer,carefully designed web crawler can quickly extract valuable data according to people's rules.Compared with manual way,web crawler can more efficiently obtain information.This project obtains the network resources through the crawler,uses the distributed database system to store the massive data,carries on analyzing and processing the data,generates the ranking list,and finally displays through the ranking list page.Based on the above planning,this paper designs and implements the distributed-crawler based ranking system.This system already runs online and achieves the expected results in the actual application.The main work of this paper has the following four aspects:Firstly,based on the distributed structure,the design of web crawler can achieve instant crawling,template-based crawling,configuration-based crawling,incremental crawling,changing the number of machines freely,and turning on and off at any time.And its de-duplication rate can reach 100%.Secondly,distributed database middleware is used to realize the distributed database system with the advantages of easy maintenance,high availability,good scalability,high speed I/O.Besides,the middleware can meet the storage of the crawler result and ranking list data.Thirdly,through analyzing and processing the data,this paper puts forward a ranking list algorithm.Accordingly,the ranking list can get an objective result.Finally,using the MVC design pattern and the spring development framework,this paper comform with the relevent business logic to complete the design of the ranking web system.The above arrangement meets the requirements of the ranking system.
Keywords/Search Tags:web crawler, ranking list, distributed database system, load balancing
PDF Full Text Request
Related items