Font Size: a A A

Research On Network Information Extraction Based On Strategy

Posted on:2014-02-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y H WuFull Text:PDF
GTID:2268330401466814Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
The collection of information had been highly affected before entering theinformation age. The work is now receiving unprecedented attention during the age ofinformation, especially in some application areas. As the rapid development of theinternet, the information on the network increases greatly, which provides convenientconditions for using it. However, the workload is daily on the increase as the increasinginformation.This paper takes the theory of the network information retrieval and textinformation extraction technologies as the main objects. Then, a kind of software forextracting network information is proposed and designed.1. A method for network information extraction is proposed based on the theory ofnetwork information collection and the technology of information extraction. Gainingweb page information with the web crawler technology and analyze it, the users canobtain compliance information according to the format based on information extractionstrategy set by themselves.2. Web crawler technology,URL re-extinction,hypertext transfer protocol,hypertext markup language and regular expression are also discussed. Moreover, thispage analyzes the mode of commercial search engines operation, and proposes theoperating procedures of the engines.3. A kind of strategy-based network information extraction software is designedand implemented. The software constructs the information extraction strategy on thebasis of regular expression, and extracts the useful information. It has the graphical userinterface for altering the information extraction strategy, and it implements the functionof web crawler and the ability of calling the search engines. Ultimately, experiments forthe functions and efficiency of the software are carried out, verifying whether it meetsthe expected goal. Then the existing problems are discussed and improvement measuresare given.
Keywords/Search Tags:Web Crawler, Information Extraction, Search Engine
PDF Full Text Request
Related items