Font Size: a A A

Research On Technology Of Deep Web Oriented Data Extraction And Semantic Annotation

Posted on:2011-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:H P ChenFull Text:PDF
GTID:2178360305476537Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology, Web databases have became prevalent on the Web. Based on user's request, Web databases display the object information stored in the databases in the form of HTML pages dynamically. The whole information embedded in these pages is called as Deep Web, which can not be acquired by traditional search engines. Recently, the research results show that Deep Web contains a great amount of valuable information. Therefore Deep Web served as a research hotspot has been paid more and more attention by researchers.This paper studies the technology of Deep Web Oriented Data Extraction and Semantic Annotation.The paper's main research works include:1) This paper gives a presentation about the relevant technology and evaluation criteria of Web information extraction in detail.Then, after introducing the problem of extracting web objects from search results pages of Deep Web, this paper proposes a system architecture to solve this problem.2) Based on the analysis of the page layout of search result pages, this paper combines the vision features of web page and DOM model to propose the Page Layout based Data Region Finder Algorithm.3) Based on analysis of the creation model of search result page, this paper proposes a method to extract data records automatically by searching the Continuous Similar Node-Groups under the data region node.4) By taking the semantic annotation of data items as a stochastic process, which is influenced by the context, this paper proposes a semantic annotation method with Maximum Entropy Model, which is under the guidance of Domain-Object Schema.At last, this thesis also performs experiments on the methods metioned. Experiments show these methods are effective.
Keywords/Search Tags:Deep Web, Web Object, Information Extraction, Semantic Annotation
PDF Full Text Request
Related items