Research On Webpage Text Extraction And Management Based On Internet Information Retrieval

Posted on:2015-09-10

Degree:Master

Type:Thesis

Country:China

Candidate:H Yu

Full Text:PDF

GTID:2298330467977010

Subject:Computer technology

Abstract/Summary:

With the developing of modern society and the rapid changing of geographical environment,the traditional geographic information mapping encounters many problems. As the current mostimportant information carrier, Internet has the advantage of strong real-time and low price whichbecomes a new way for geographical information retrieving. The network information retrievalcombined with natural language processing makes it possible for acquiring geographicalinformation from the great capacity of information on the Internet, it can search the update ofgeographical information and accomplish real-time detection, makes up for the deficiency oftraditional mapping methods.This thesis studied the methods of network information retrieval, focuses on the poor generalityof current topic-focused web crawler, and a new topic-focused crawler algorithm based onbacktracking is promoted. According to the structural features of current news websites, this methodcan calculate the link paths which most probably lead to topic information by backtracking, andincrease the efficiency of web crawling as a result. Combined web mining and natural languageprocessing, the methods for retrieving web text elements and geographical information elements arealso presented, which can acquire information from web page correctly. Finally, this thesis realizedthe prototype system of the geographical information update detection system which based ontopic-focused web crawler. From the results of many experiments, this prototype system showsexcellent usability, good recall and precision, also prove the topic-focused crawler based onbacktracking algorithm performs much better than traditional web crawler.

Keywords/Search Tags:

information retrieval, topic-focused web crawler, backtracking algorithm, webmining, natural language processing

Related items

1	The Focused Crawler Based On URL And Context
2	Research On Topic Focused Web Crawler And Related Technologies
3	Research On Autonomous Task Planning And Execution Of Service Robot Based On Natural Language
4	Design And Implementation Of Focused Crawler For Blogs
5	The Design And Implementation Of The Topic-focused Web Crawler System
6	The Research And Implement Of Topic-focused Web Crawler Based On SVM Classification Algorithm
7	Research On Focused Hidden Web Crawler
8	Research On Machine Learning For Natural Language Processing And Transmission
9	Research On The Topic Crawler Algorithm Based On Vector Space Model
10	Research And Application Of Natural Language Processing In Information Retrieval