Font Size: a A A

Research On Webpage Text Extraction And Management Based On Internet Information Retrieval

Posted on:2015-09-10Degree:MasterType:Thesis
Country:ChinaCandidate:H YuFull Text:PDF
GTID:2298330467977010Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the developing of modern society and the rapid changing of geographical environment,the traditional geographic information mapping encounters many problems. As the current mostimportant information carrier, Internet has the advantage of strong real-time and low price whichbecomes a new way for geographical information retrieving. The network information retrievalcombined with natural language processing makes it possible for acquiring geographicalinformation from the great capacity of information on the Internet, it can search the update ofgeographical information and accomplish real-time detection, makes up for the deficiency oftraditional mapping methods.This thesis studied the methods of network information retrieval, focuses on the poor generalityof current topic-focused web crawler, and a new topic-focused crawler algorithm based onbacktracking is promoted. According to the structural features of current news websites, this methodcan calculate the link paths which most probably lead to topic information by backtracking, andincrease the efficiency of web crawling as a result. Combined web mining and natural languageprocessing, the methods for retrieving web text elements and geographical information elements arealso presented, which can acquire information from web page correctly. Finally, this thesis realizedthe prototype system of the geographical information update detection system which based ontopic-focused web crawler. From the results of many experiments, this prototype system showsexcellent usability, good recall and precision, also prove the topic-focused crawler based onbacktracking algorithm performs much better than traditional web crawler.
Keywords/Search Tags:information retrieval, topic-focused web crawler, backtracking algorithm, webmining, natural language processing
PDF Full Text Request
Related items