Font Size: a A A

Design And Implementation Of Nutch Crawler System Based On Linked In And Microsoft Academic

Posted on:2016-07-02Degree:MasterType:Thesis
Country:ChinaCandidate:M LuFull Text:PDF
GTID:2348330482497037Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of computer, communication and network technology, the era of big data has arrived, the network information has become an important part of information in modernsociety. The network world is full of huge amount of information, the information explosion eraushered in, through the acquisition and analysis of the information, to help users quickly finduseful information, make the world high level personnel information database becomes feasible,this kind of information is analyzed, based on this research is to solve the problems of obtainingsuch information.This essay will show how the writer manage to obtain information of high-level personnel in the world by ways of using the Proxy IP Mechanisms, simulating login,analyzing websites and storing the information into database,and also by marking the information of people of ethnic Chiese to obtain information in linkedin website and microsoft academic website with Crawler secondly developed based on Nutch. After testing, the program the writer use can achieve an access to personnel information comprehensively and efficiently.
Keywords/Search Tags:nutch, simulating login, analyzing, websites, Crawler
PDF Full Text Request
Related items