Based On Neural Networks For Web Information Extraction System

Posted on:2007-05-18

Degree:Master

Type:Thesis

Country:China

Candidate:T B Ming

Full Text:PDF

GTID:2208360185456048

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of Internet, web has become one of the most important knowledge repositories. It's a valuable and promising research field to extract and make full use of these information and knowledge. The goal of Web Information Extraction (Web IE) is to locate,identify the interested information from heterogeneous web sites, and to organize the extracted information in a homogeneous and structured format. The inherent features of Web site including huge number,various format and frequent updating cause the problems to Web IE, including complexity, adaptability and scalability.Architecture of Web IE using BP Neural Network is proposed in this thesis based on the analysis of features of semi-structured documents. The system uses XML as representation model of web pages, BP Neural Network to learn extraction rules. It including several knowledge repositories as well as three modules of Web pages preprocessor, rule learning and information extraction, describing the web pages by four sides: semantic content display, logic structure, rule generation and extraction results.This thesis concentrates its research mainly on the method of rule learning base on BP Neural Network which definition of rules are combined the web pages'features of path, left/right boundary and semantic. Neural Network makes label elements of filter DOM tree as input, extract results as output, training via BP learning arithmetic. And generating simple and strong rules which can used by information extraction module through rule learning arithmetic after training.Experiments indicate that the system can learn rules about interested domain, also has good adaptability and scalability.

Keywords/Search Tags:

Web Information Extraction, XML, BP Neural Network, Rule Learning

PDF Full Text Request

Related items

1	Knowledge Discovery And Control Rule Extraction Based Fuzzy Neural Network
2	Design And Implementation Of Web Information Extraction Rules
3	Research Of Rule Extraction And Classification Algorithm Based On Neural Network
4	Research On Chinese Entity Relation Extraction Based On Rule Matching And Neural Network Learning
5	The Study Of Rule Induction For Automatic WEB Data Extraction
6	Research On Network Information Extraction And Visualization Technology Of Incident
7	Semi-structured Web Information Extraction Technology And Its Application
8	Research And Implementation Of Chinese Information Extraction Based On Deep Learning
9	Web Information Extraction Rules And Their Learning Algorithms
10	Chinese Information Extraction Based On Hybrid Neural Network