Font Size: a A A

Research And Application Of Crawler Algorithm In Internet Public Opinion System

Posted on:2016-12-15Degree:MasterType:Thesis
Country:ChinaCandidate:J JiFull Text:PDF
GTID:2308330479498283Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the fast development rate of technology, a brand new “age of information” has come. Internet has already become one of the most popular information carriers. Facing the large number of information on the Internet, how to supervise and search them has become the research focus. Since the general search engine couldn’t meet the needs of a specific group of people, the Focused Crawler has arisen at the historic moment, which provides the data for the Internet public opinion system.This paper is based on the research results of researchers from home and abroad, and designs the Internet public opinion system according to the analysis of the current Internet circumstance, at the same time it studies and designs the Focused Crawler. This paper also improves the algorithm parts which influence the ability of the crawler, then implements the crawler into the system. The main content of research is as follows:1) Compare the Focused Crawler to the normal crawler and do the research on the search strategy and evaluation algorithm. Choose the best priority strategy according to the comparison of all search strategies, while selecting vector space model as the way of Web page evaluation. Study the two important issues: topic isolated island and Robots Exclusion Protocol. Then design the main structure of the Focused Crawler used in this paper based on the studies.2) Analyze the characteristics of the Internet circumstance, then giving the demand analysis according to it. Design the main structure of the system based on the demand analysis.3) Realize the crawler, including crawling strategy, analysis strategy, removing duplicate pages and multitasking strategy. Improve the crawling strategy by implementing keywords dynamically expanding strategy. Improve the removing duplicate pages strategy and multitasking strategy and carry out the experiments which testing the ability of the crawler.The Focused Crawler designed by this paper combines the keywords expanding and algorithm improvements to improve the performance of it. According to the test results, the crawler designed in this paper is better than the common crawler.
Keywords/Search Tags:Internet public opinion, focused crawler, dynamically expanding, consistent hashing
PDF Full Text Request
Related items