Study On Information Extraction And The Index Of Topic Search Engine

Posted on:2008-09-25

Degree:Master

Type:Thesis

Country:China

Candidate:M Yu

Full Text:PDF

GTID:2178360242471636

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the explosion of World Wide Web,"Information Overload"has become a serious problem. To help people accurately get the piece of information what he wants from the Web, information extraction from web pages is necessary. The program that performs this task is called wrapper. The key requirements are that a wrapper can be constructed rapidly, without much human intervention, and the wrapper should be robust, adaptable to the change of web page, moreover, the wrapper should be as general as possible, that is, it is independent on particular web site.Many approaches have been proposed to ease wrapper generation. Almost all of them use proprietary extraction languages. The languages are simple, hard to express accurate or complex extraction pattern. Although through labled examples, extraction rules can be induct automatically, they are not accurate, not robust or general. We apply standard technologies of XML to web information extraction problem.With standard XSLT, we can exploit strong and flexible features of the language to construct simple, robust and general extraction rules. We have developed a platform to ease wrapper construction.The failure of extraction rules is mainly due to the failure of XPath expression.This paper studies the optimization methods of extraction of extraction rules and put forwards several improved location methods. Moreover ,the combination sreategy of these methods is put forward to generate simple.these methods have been used in the information extraction to get better precision.

Keywords/Search Tags:

XSLT, Information Extraction, XML

PDF Full Text Request

Related items

1	Web Information Extraction Based On Principle Part Extraction
2	Research On Web Informaition Extraction Techniques
3	Study On Information Extraction And The Index Of Topic Search Engine
4	Semi-structured In The Xml-based Web Information Extraction
5	Design And Implementation Of Web Information Extraction Based On Dom
6	Design And Implementation Of Web Information Extraction Based On DOM
7	Research Of Web Information Extraction Based On XML
8	Automatic Extraction Of Information From Web Pages
9	The Research Of XML-Based Web Information Extraction
10	Based On The Xml Deep Web Information Extraction System With The Initial Implementation,