Font Size: a A A

Personalized Web Query System Based On Domain Knowledge And Information Extraction

Posted on:2003-10-11Degree:MasterType:Thesis
Country:ChinaCandidate:W Z YangFull Text:PDF
GTID:2168360122461064Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Since 1991, Web has already become a giant, global information resource in a few years. As the Web information has feature of giant, distributed, dynamic, and the existing Web information query tools are low efficiency, it is very difficult and boring of locating interested information on the Web. So, how to efficiently and precisely query Web information and reuse the query result is becoming an impending problem to be resolved.We design and implement a "personalized Web query system based on domain knowledge and information extraction" to resolve the problem. Firstly, the system divides the Web information into different domains by schema and style, builds up domain knowledge database for navigating the user's query and for limiting query scope. Secondly, it utilizes existing Web search engine to implement keywords query and the query result is a set of related URL. Then User browses the Web pages, selects interested Web page as sample page and create conceptual schema based on the understanding of content of the sample page. Thirdly, the user marks the interested information blocks in sample page and build up the correspondence between the interested information blocks of sample page and fields of conceptual schema, and at the meanwhile system passes the correspondence to learning module to form extraction rules and stores them into rule database. Fourthly, extraction module extracts information from the similar-structured pages by using the extraction ailes and sends the extracted results to classified Cache database for users' further query. Lastly, users utilize query module to implement personalized Web query. The prototype system has friendly user interface and is easy to use. It uses existing Web search engines to query the Web, uses information extraction technology to extract information from interesting Web pages and stores the extracted information into classified Cache database to avoid requerying the Web.This paper emphasized on the implementation of integral system and the application of domain knowledge database, classified cache and rule database, ignores the detail of learning and information extraction.
Keywords/Search Tags:HTML, Information extraction, Domain knowledge, Personalized Web query, Classified Cache, Rule management
PDF Full Text Request
Related items