Font Size: a A A

Study On Methods Of Ontolog—Based Deep Web Data Integration

Posted on:2013-08-26Degree:MasterType:Thesis
Country:ChinaCandidate:D S LiFull Text:PDF
GTID:2248330362471397Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the proliferation of Web information, more and more information storageways begin to transfer from static webpage to database online that maintained by theweb server (Deep Web). Compared with the Surface Web, Deep Web containsinformation has lager, high quality, and growth fast characteristics. In recent years, theresearch on Deep Web has become a hotspot in the field of web research.The purpose of Deep Web data integration research is to realize Deep Webinformation’s search in more areas. Deep Web data resource discovery and dataextraction are two key point of Deep Web data integration research. Some scholarsproposed some Deep Web data source discovery frames and Deep Web data extractionalgorithms. In The current mainstream data source discovery frames, ontology can’textension automatically and frames lack of adaptability. Problem of low recall andprecision exits if mainstream data extraction algorithms are used to extract data fromquery results pages.Aiming the above problems, Semi-automatic constructing and automaticextension of ontology are added to the data resource discovery frame to increase theadaptability for the frame. A method that combines the index with the edit similaritymethods is proposed to improve the data extraction’s recall precision of the resultspage. The study of this paper includes following aspects:1. Research on Deep Web data source discovery based on ontology, and useswebpage, form content classification and form structure classification methods to findDeep Web querying interface in some fields, and semi-automatic constructing andautomatic extension of ontology are added to the webpage and form contentclassification methods; Based on the core ontology which is constructed by domainexperts, vocabularies in webpage, which have higher similarity to the core ontology,are extracted as the preparation expansion vocabularies. Then, with the ontologyexpansion strategy, use these expansion vocabularies to expand ontology. 2. Research on Deep Web data extraction of query results web pages, andmethod that combines the index with the edit similarity methods is proposed. Throughrecording indexs of key word in query results web, and find the biggest public nodewhich contain key word to determine primary data area in query results web. Throughcalculating edit similarity between the data blocks and filtering out smaller similaritydata block, and abandoning data extracting results which have smaller averagesimilarity to ontology, realize data block extraction from primary data area.3. The experiments verify that the methods have certain feasibility. This DeepWeb data source discovery frame has certain adaptability, and can reduce the waste ofhuman resource for ontology’s construction and expansion. According to thecomparison of experimental, the method combining the index with the edit similaritycan improve the recall precision of query result pages’ data extraction.
Keywords/Search Tags:Deep Web, data integration, ontology, data resource, adaptability, web data extraction
PDF Full Text Request
Related items