Font Size: a A A

Study On Information Extraction And Data Annotation Of Deep Web Data Integration System

Posted on:2011-08-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y GaoFull Text:PDF
GTID:2178360308454101Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Due to the heterogeneity and the diversity of data on Internet, the structure of different Websites even belonging to the same field is distinction. Consequently, it has become increasingly difficult for the users to extract data that they are interested in from the growing and vast amount of data from Web. Currently, there is an important task of extracting related data that users interested in from result pages; adding semantic information; integrating into a unified structured form for subsequent disposal and application, namely, this is Web data extraction and semantic annotation in Deep Web data integration system.In the study of the Deep Web data integration system, existing data extraction methods depend on the query interface schema and query results schema; use tree-edit distances algorithm leading to high time complexity; affecting the effectiveness of data extraction. In this paper, we apply XML technology to Web data extraction and apply ontology to semantic annotation. The main contribution of this paper is the following:1.This paper describes a novel method of data extraction based on index path in Web (DEIP). Firstly, it establishes the index path for each text node. Secondly, it locates data-rich by keywords; generates extraction rule and outputs a wrapper according. Then the wrapper can extract data automatically in the same domain from the identical Website. This method makes full use of the continuity and structural similarity of data to search data-rich and extracts the data; moreover, it does not depend on the HTML tags or tree-editing distance.2.This paper split data units with nested attributes applying domain knowledge, data content, data form, and data types. As to returned result page, there are more than an attributes in one label from some Web sites, and we need to deal with them before adding semantic annotation.3.Using of the mapping between concept and concept of ontology, concepts and instances of books ontology to annotate part of attribute values; applying special data format to create annotation rules to the other attribute values.
Keywords/Search Tags:Deep Web, Data extraction, Semantic annotation, Ontology
PDF Full Text Request
Related items