Research On Network Information Extraction Based On Strategy

Posted on:2014-02-08

Degree:Master

Type:Thesis

Country:China

Candidate:Y H Wu

Full Text:PDF

GTID:2268330401466814

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

The collection of information had been highly affected before entering theinformation age. The work is now receiving unprecedented attention during the age ofinformation, especially in some application areas. As the rapid development of theinternet, the information on the network increases greatly, which provides convenientconditions for using it. However, the workload is daily on the increase as the increasinginformation.This paper takes the theory of the network information retrieval and textinformation extraction technologies as the main objects. Then, a kind of software forextracting network information is proposed and designed.1. A method for network information extraction is proposed based on the theory ofnetwork information collection and the technology of information extraction. Gainingweb page information with the web crawler technology and analyze it, the users canobtain compliance information according to the format based on information extractionstrategy set by themselves.2. Web crawler technology，URL re-extinction，hypertext transfer protocol，hypertext markup language and regular expression are also discussed. Moreover, thispage analyzes the mode of commercial search engines operation, and proposes theoperating procedures of the engines.3. A kind of strategy-based network information extraction software is designedand implemented. The software constructs the information extraction strategy on thebasis of regular expression, and extracts the useful information. It has the graphical userinterface for altering the information extraction strategy, and it implements the functionof web crawler and the ability of calling the search engines. Ultimately, experiments forthe functions and efficiency of the software are carried out, verifying whether it meetsthe expected goal. Then the existing problems are discussed and improvement measuresare given.

Keywords/Search Tags:

Web Crawler, Information Extraction, Search Engine

PDF Full Text Request

Related items

1	Research Of Main Technologies Of Vertical Search Engine
2	The Research And Implementation Of Cubic Relationship Search Engine In Taiwan Field
3	Research Of Intranet Information Supervision System Based On Net Crawler And Full-text Search Engine
4	Research And Implement Of Individualized Vertical Search Engine
5	Research And Implementation Of Vertical Search Engine
6	Rearch On Information Extraction And Search Based On Web
7	Research And Realization On Focused Crawler Key Technologies Of Vertical Search Engine
8	Research On Web Crawler Technology In Search Engine
9	Research On Network Information Extraction Based On Strategy
10	Design And Implementation Of Vertical News Search Engine Based On Heritrix