Study On Methods Of Ontolog—Based Deep Web Data Integration

Posted on:2013-08-26

Degree:Master

Type:Thesis

Country:China

Candidate:D S Li

Full Text:PDF

GTID:2248330362471397

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the proliferation of Web information, more and more information storageways begin to transfer from static webpage to database online that maintained by theweb server (Deep Web). Compared with the Surface Web, Deep Web containsinformation has lager, high quality, and growth fast characteristics. In recent years, theresearch on Deep Web has become a hotspot in the field of web research.The purpose of Deep Web data integration research is to realize Deep Webinformation’s search in more areas. Deep Web data resource discovery and dataextraction are two key point of Deep Web data integration research. Some scholarsproposed some Deep Web data source discovery frames and Deep Web data extractionalgorithms. In The current mainstream data source discovery frames, ontology can’textension automatically and frames lack of adaptability. Problem of low recall andprecision exits if mainstream data extraction algorithms are used to extract data fromquery results pages.Aiming the above problems, Semi-automatic constructing and automaticextension of ontology are added to the data resource discovery frame to increase theadaptability for the frame. A method that combines the index with the edit similaritymethods is proposed to improve the data extraction’s recall precision of the resultspage. The study of this paper includes following aspects:1. Research on Deep Web data source discovery based on ontology, and useswebpage, form content classification and form structure classification methods to findDeep Web querying interface in some fields, and semi-automatic constructing andautomatic extension of ontology are added to the webpage and form contentclassification methods; Based on the core ontology which is constructed by domainexperts, vocabularies in webpage, which have higher similarity to the core ontology,are extracted as the preparation expansion vocabularies. Then, with the ontologyexpansion strategy, use these expansion vocabularies to expand ontology. 2. Research on Deep Web data extraction of query results web pages, andmethod that combines the index with the edit similarity methods is proposed. Throughrecording indexs of key word in query results web, and find the biggest public nodewhich contain key word to determine primary data area in query results web. Throughcalculating edit similarity between the data blocks and filtering out smaller similaritydata block, and abandoning data extracting results which have smaller averagesimilarity to ontology, realize data block extraction from primary data area.3. The experiments verify that the methods have certain feasibility. This DeepWeb data source discovery frame has certain adaptability, and can reduce the waste ofhuman resource for ontology’s construction and expansion. According to thecomparison of experimental, the method combining the index with the edit similaritycan improve the recall precision of query result pages’ data extraction.

Keywords/Search Tags:

Deep Web, data integration, ontology, data resource, adaptability, web data extraction

PDF Full Text Request

Related items

1	Research On Key Technologies Of Ontology-Based Deep Web Information Integration
2	Study On Information Extraction And Data Annotation Of Deep Web Data Integration System
3	Resource-oriented Data Integration Model Application In The Medical System
4	Research On Key Issues In Deep Web Data Integration
5	Key Techniques On Deep Web Data Extraction
6	The Key Technology Research Of Deep Web Data Integration
7	Research On Adaptive Wrapper In Deep Web Data Extraction
8	Research And Application Of Deep Web Data Cleansing
9	Research And Application Of Heterogeneous Data Integration Based On Ontology In Data Warehouses
10	The Study Of Deep Web Data Integration System Design And Application