Design And Implementation Of Web Information Extraction System SEU-WIE

Posted on:2007-03-26

Degree:Master

Type:Thesis

Country:China

Candidate:Y Yu

Full Text:PDF

GTID:2178360212465628

Subject:Computer application technology

Abstract/Summary:

With the development of Internet, World Wide Web has become to be a huge space of distributed information. But users can not get the information they need quickly because the inherent property of Internet, which is opening, dynamic and heterogeneous framework. It became to be a difficulty that how to get the information one need quickly and exactly form the huge information resource. As a solvent, all kinds of Web information extraction technologies come into bring. But all of them have the limitation in the applicant.This paper has researched the Web Information Extraction technology, analyzed the requirement of the project, then designs and implements of the Web Information Extraction system SEU-WIE which developed by us. The system takes the Extraction rules definition and the Extraction rules execution apart, and has a user-friendly interface. The system has the generality and flexibility. There are two parts in the system, the definition of the Extraction rules and the execution of the Extraction rules. In the phase of the definition of the Extraction rule, first introduce how to transform data represented by HTML to the well-formed XML document and how to get the DOM tree of the XML document. Then user specify the location of the information which will be extracted and map it to the target table to define the Extraction rules. In the phase of the execution of the Extraction rules, first the system gets the data block by xpath in the Extraction rules which defined by user, then gets the ontology information and extracts the data with the algorithm of IEOntoMatch. Finally, stores it in a structured way.The paper also introduces the research the pre-processing. The datas extracted from Web have all kinds of problems in the quality of the data. So the datas should be cleaned, transformed, integrated and etc.

Keywords/Search Tags:

Web Information Extraction, Extraction rules, XML, DOM, ontology, IEOntoMatch, data pre-processing

Related items

1	Ontology-Based Structured Information Extraction From Web Pages
2	Adaptive Web Information Extraction Method Research Based On Ontology
3	Research On Web Information Extraction Technology Based On Frame Semantic Tagging
4	Heuristic rules for extraction of ontology from Web pages in WebOntEx
5	Research On Web Product Indicator Extraction Based On Ontology
6	Information Extraction Technology Based On Ontology Web Non-normative Knowledge Processing
7	Study On Ontology-based Information Extraction Of Emergency Case
8	Research On The Ontology-based Information Extraction For Personal Homepage
9	XML-based WEB Information Extraction System Research And Implementation
10	Research On Language And Key Techniques For Accurate Information Extractionrules Towards Complex Web