Font Size: a A A

Deep Web Data Annotation Based On Result Schema

Posted on:2012-07-27Degree:MasterType:Thesis
Country:ChinaCandidate:X L LiFull Text:PDF
GTID:2178330335466864Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology and expand the scale of containing information, there exist more and more Deep Web Resources access by web querying interface form(Abbreviation as Web Database or WDB), Web database can be widely used. Information retrieval as an important part of the Internet application, with the deepening of the research of information retrieval on the network environment, Deep Web data integration system research getting more and more people's attention. In recent years research shows that Deep Web contains a lot of valuable information, which is highly correlated to the market demand, In order to the automatic acquisition of the Deep Web information resources, it's need establish Deep Web data integration system. WDB pages are mostly structured HTML document with a template, but the HTML language characteristic is released on the Web, and content variety, it made the Web data in sprawling state, and caused great difficulties to Deep Web data integration system establishment.Semantic annotation as a very important part in query results processing module of Deep Web data integration system, it main work refers to add correct semantic information for extraction the Deep Web search results data, make these data with higher use value, then these data can be computer recognition and processing. The article first introduced the research background and related knowledge of Deep Web research; Secondly, the paper depth-research pattern extraction and Semantic annotation technology, and puts forward the corresponding method and the model chart; finally, use the result schema information to effectively annotation WDB data, the main research work in this article includes:1.To address the loss problem of Deep Web result schema information, a novel approach Deep Web result pattern extracting based on heuristic information is proposed. Through analyzing Deep Web result page data and adding correct attribute names to result pages data by heuristic information, the corresponding of Deep Web result pattern can be obtained. Moreover, the structure conflict will be solved by standardized treatment. Experimental results show that the method can effectively extract result pattern.2. By comparing the different advantages and disadvantages of WDB semantic annotation, since the existing Deep Web data annotation methods can not effectively solve the query result data annotation problem, an approach of Deep Web data annotation based on result schema is proposed. Through analyzing Deep Web result pages and extracting structured data to complete data pretreatment work, and establishing the correct semantic mapping relation between integrated result schema and staying annotation data, achieve the purpose of correct annotation Deep Web data. Experiments over four real areas show that the proposed method can efficiently annotate Deep Web data.
Keywords/Search Tags:Deep Web, Semantic Annotation, Interface Schema, Result Schema, Heuristic Information, Data Annotation, Data Extracting
PDF Full Text Request
Related items