Research Of Web Information Extraction Technology Based On Semantie

Posted on:2010-10-02

Degree:Master

Type:Thesis

Country:China

Candidate:W L Huang

Full Text:PDF

GTID:2178360272479351

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

As a global information space, Web contains tremendous intrinsic value, how to extract the information that user need exactly from complex data becomes a very important issue. Although a great deal of research have been carried out for web data extraction, existing technology is lack of description to data itself and never contains clear semantic information, pattern is not specific neither, which is difficult to fit the web's characteristic of diversity in structure and pattern, which makes application program cannot analysis and make use of the mass information on web directly which causes huge waste.This article introduces Suffix Tree technology coupling with data characteristic of a knowledge intensive web site, extracts available data pattern, creates domain Ontology with Protege tools realizes semantic information expansion in the process of information extraction and eliminates the isomerism of homogeneous message source of web site by use of method for establishing based on semantic.This article emphasizes the implementation technique of information data extraction of knowledge intensive web site based on the overall solution of Ontology technology and Semi-Structured web mformation extraction technology. This article puts forward an Ontology-driven information extraction pattern information-positioning by file structure and Feature Matching via analysis and research on fundamental principle of conventional method of information extraction, technology and development status, details design thinking of the pattern and the flow of extraction.This system resolves the isomerism problem among the web files by gaining the specified HTML web page in the first place, transferring the web page to well-formed XML file based on the file Converting arithmetic based on Stack Structure and link Structure, then extracting data pattern from the XML file with Suffix Tree technology, increasing semantic information for these information by use of Ontology establishing method, makes a formal description for the domain Ontology with OWL which is Web Ontology Language, generating extraction rule-base, realizes the transition from data extracted to rdf data model which containes semantie information.This thesis realizes semantic information affixation via application of Ontology, finishes data pattern extraction of web site structure by use of Suffix Tree technology. The job realizes the pattern extraction of information data source on knowledge-intensive web site, which can help user discover valuable information resource on web and provides a effectual tool to make use of the mass data on web at the same time.

Keywords/Search Tags:

Suffix tree, Semantie, Web information extraction, Ontology, XML

PDF Full Text Request

Related items

1	Research Of Web Information Extraction Technique Based On Ontology And Text Feature
2	Research Of A Suffix Tree Based Automatic Wrapper Generation Method
3	Ontology-Based Structured Information Extraction From Web Pages
4	Finding MUMs With Enhanced Suffix Arrays
5	Automatic Extraction Of Uyghur Ontology Concept Classification Relationship Based On Seed Bootstrap
6	Research On Construction Of Index Structure For Biological Sequences
7	Extraction Technology Research, Based On Ontology Can Be Customized Web Information Intelligence
8	The Research Of Web Text Clustering Based On Ontology
9	Adaptive Web Information Extraction Method Research Based On Ontology
10	Multi-pattern Matching With Wildcards Based On Suffix Tree And Suffix Array