Personalized Web Query System Based On Domain Knowledge And Information Extraction

Posted on:2003-10-11

Degree:Master

Type:Thesis

Country:China

Candidate:W Z Yang

Full Text:PDF

GTID:2168360122461064

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Since 1991, Web has already become a giant, global information resource in a few years. As the Web information has feature of giant, distributed, dynamic, and the existing Web information query tools are low efficiency, it is very difficult and boring of locating interested information on the Web. So, how to efficiently and precisely query Web information and reuse the query result is becoming an impending problem to be resolved.We design and implement a "personalized Web query system based on domain knowledge and information extraction" to resolve the problem. Firstly, the system divides the Web information into different domains by schema and style, builds up domain knowledge database for navigating the user's query and for limiting query scope. Secondly, it utilizes existing Web search engine to implement keywords query and the query result is a set of related URL. Then User browses the Web pages, selects interested Web page as sample page and create conceptual schema based on the understanding of content of the sample page. Thirdly, the user marks the interested information blocks in sample page and build up the correspondence between the interested information blocks of sample page and fields of conceptual schema, and at the meanwhile system passes the correspondence to learning module to form extraction rules and stores them into rule database. Fourthly, extraction module extracts information from the similar-structured pages by using the extraction ailes and sends the extracted results to classified Cache database for users' further query. Lastly, users utilize query module to implement personalized Web query. The prototype system has friendly user interface and is easy to use. It uses existing Web search engines to query the Web, uses information extraction technology to extract information from interesting Web pages and stores the extracted information into classified Cache database to avoid requerying the Web.This paper emphasized on the implementation of integral system and the application of domain knowledge database, classified cache and rule database, ignores the detail of learning and information extraction.

Keywords/Search Tags:

HTML, Information extraction, Domain knowledge, Personalized Web query, Classified Cache, Rule management

PDF Full Text Request

Related items

1	Semi-structured Web Information Extraction Technology And Its Application
2	Design And Implementation Of Knowledge Base Augmentation System Based On HTML Tables
3	Research On Knowledge Extraction Technique In The CBT Development Platform
4	Research And Application On The Technology Of Web Information Extraction Based On The HTML
5	Unstructured Information Extraction Methods For Domain-Specific Knowledge Graphs
6	Research On The Technology Of The Web Employment Information Extraction Based On The HTML
7	Based On The Html Pages Of Web Information Extraction
8	Construction Of Telecom Consulting Domain Ontology And Generation Of The Content Of Knowledge Entity
9	A Research On Methods Of Knowledge Acquisition From Domain-Specific Texts And Their Application In Knowledge Acquisition From Archaeological Texts
10	Research On The HTML And PDF Informaiton Extraction Technology Based XML