Font Size: a A A

RSS And Ontology Semantic Based Autonomic Web Page Collection In Vertical Search Engine

Posted on:2009-06-11Degree:MasterType:Thesis
Country:ChinaCandidate:H B ZhangFull Text:PDF
GTID:2178360245486064Subject:Computer applications
Abstract/Summary:PDF Full Text Request
Search engines are important tools/programmes for people to fast locate online information. Users can obtain the appropriate information by keywords/full-text searching via search engines. While general-purpose engines bring forth the massive information to the user query, they have trouble in maintaining comprehensive and up-to-date search indexes. They fail to deliver high accurate and correlated results and couldn't satisfy the personalized and professional query.Vertical search can be regarded as the extension and customization of general ones. Such engines focus on a certain domain, identify and integrate the domain specialized information, extract the needed data, and wrap them into formatted information. Within which, topic oriented web page collection is the key and basic part. On the basis of the analysis on vertical search, the author has performed lots research and implementation of the web page collection. The main research work presented in this paper is as following:1. It prompts HPath web extraction method on the basis of DOM parsing, to solve the heterogeneous DOM parsing. By doing so, it presents a base for commercial topic oriented web page collection and vertical search engine both in theory and practice.2. It brings forth a scheme for high precision topic web page collection on the basis of Web2.0 technique, and solves the multi-standard problem in RSS.3. An ontology semantic adption solution is presented to cope with the heterogeneous semantic of web pages from various systems, and semantic distance function is defined for web page conclusion and classification.4. The ECA rule system is modified to fit IBM's automonic computing framework, and an automonic web page collection system is designed which targets on the applicability and maintenability.
Keywords/Search Tags:vertical search, topic oriented web page, heterogeneous integration, RSS, XPath, Web2.0
PDF Full Text Request
Related items