An Information Extraction System For DynamicView

Posted on:2007-08-23

Degree:Master

Type:Thesis

Country:China

Candidate:J He

Full Text:PDF

GTID:2178360212965573

Subject:Computer software and theory

Abstract/Summary:

With the growing of World Wide Web (WWW), WIR and WIE techonology has been developed rapidly. More and more researchers are paying attation to how to extraction information from the Web.WIR can be used for locating a specific page that contains the relevant information on the Web. Unlike WIR, WIE can extract revelant information from a specific page directly and transform the revelant information into structural format.Generally speaking, WIE methods can be divided into two categories: one is based on structure of a page, such as Page Structure Grammar Inference and Page Segmentation; the other is based on language feature of a page, such as template filling. Unlike free text information extraction, the number of annotated web page for a specific domain on the Web is small. Hence, how to extract information with high accuracy without increasing the tedious manual work is a critical problem to be solved.Based on the analysis of existing WIR and WIE algorithm and the target of DynamicView project, this thesis proposes a WIR algorithm based on structure template to get the faculty's homepage from the Web and a page segmentation based WIE algorithm to extract the facultys'research interest from their homepages. The WIR algorithm applies WIE technology into WIR algorithm. In this way, the web pages with the same attributes can be found easily. The page segmentation algorithm DeSeA (Delimiter based Segmentation Algorithm) for WIE can be used to filter irrelevant information out in a web page. After this, research interestes can be extracted easily from the relevant segments using the domain knowledge. Experiments show that these two algorithms fit commendably with DynamicView.

Keywords/Search Tags:

Web Information Retrieval, Web Information Extraction, Machine Learning, Semantic Web

Related items

1	Research On Ontology-based Semantic Information System
2	Study On Information Retrieval Based On The Interest Model Of User
3	Design And Implementation Of Medical Information Retrieval System Based On Semantic Analysis
4	Research On Learning To Rank For Information Retrieval
5	Using contextual information and machine learning technique to improve retrieval performance
6	Research On Semantic Extraction Of Content-based Video Retrieval
7	Research On Gazetteer Information Retrieval Service Based On Spatial Semantic Computation
8	The Research Of Key Technology In Personalized Information Retrieval Based On Internet
9	Research And Implementation Of Web Topical Information Extraction Method With Semantic Consideration
10	Research On Information Retrieval Model Based On Learning To Rank