Font Size: a A A

An Ontology-based Approach To Integrating Interfaces On Deep Web

Posted on:2012-12-27Degree:MasterType:Thesis
Country:ChinaCandidate:A Q ZhangFull Text:PDF
GTID:2178330335950225Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of WWW, large amounts of web information are stored in databases in structured forms, which make the web deepened. It is a great challenge to fetch, manage and use the information due its sharp increase. The significant information can only be dynamically accessed through search interfaces of deep web and the search interfaces have caught researchers'close attention because it is the main way to get information from deep web. Each web has its own style of interface and database designed, so different web sites often provide different interfaces in a same domain. If users want to retrieve information of several web databases, they must fill in several interfaces which usually contain reduplicate information. It is not only waste of time and energy, but also decreases the quality of search results. Therefore, users need a unified interface which is integrated from multiple interfaces to access, compare and select useful information from different databases at the same time.Numerous techniques have bee(?)roposed to deal with interface integration. However, most existing techniques usually fetch matches based on abundance of data statistic, which not only need a large number of samples spaces but also ignore the semantic relationships between attributes, and so they usually lead to some incorrect matches. Furthermore, the schema information of attributes is not often used in the processing of interface integration that makes the integrated results incomplete and inaccurate. Therefore, this paper proposes a novel ontology-based interface integration approach.1. Domain ontology has been built through analyzing pages from numerous domains.2. A two-step schema-matching process is presented to identify correct matches between schema attributes and ontology information from the semantic perspective which includes the direct match and the indirect match. The direct match mainly focuses on keywords matching, attribute names and values. The direct match may fail to recognize correct matches due to finite ontology information. The indirect match first computes the similarities between attributes and ontology information using WordNet. Then we select the pair which has the maximal value of the similarity as the successful match pair. The domain ontology will be updated in real-time during the matching process and the matching result will be added to Concept-Match-Table which consists of pairs of the attribute name and the ontology concept. Those operations can make the subsequent matching processes easier.Through the previous matching process, each attribute obtains its successful match. However, those attributes may have other information such as format, values and layout position, which is of great importance in subsequent match and integration. Therefore, we design the matching post-processing to process and save the useful information of attribute format, values and so on. which can improve efficiency and precision of the subsequent matching process and integration.3. After all input interfaces have been processed, according to the matching information and schema information, we present a series of generation rules and layout rules of unified attribute schemas in detail.4. According to the description of workflow of ontology-based interface integration and key technology, an ontology-based interface integration system is designed and implemented. Several interfaces about book domain in the UIUC dataset are used for evaluating our approach.The experimental results show that our two-step matching approach can identify not only simple matches but also complex matches and gets higher accuracy. Both ontology information and attributes schema information are considered during interface integration processing, which make the unified interface more comprehensive and precise.
Keywords/Search Tags:Ontology, Deep web, Schema matching, Interface integration
PDF Full Text Request
Related items