Font Size: a A A

Study On Data Annotation Of Deep Web Data Integration System

Posted on:2010-09-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y ChangFull Text:PDF
GTID:2198360302961986Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the network environment, as the further study on information retrieval, more and more people pay attention to Deep Web data integration system. Deep Web is a relative concept of Surface Web. It refers to the Web information that can not be retrieved by the normal search engine, but generated dynamically according to the search words. Data annotation as an important component of Deep Web data integration system, its main work is to annotate the data extracted from the search results, and make the data identified and operated by computer.On the basis of the analysis of the Deep Web site search result pages and the data style, the dissertation introduces the conception of the result schema, object model of the annotation domain and gives the formal descriptions of them. Also the dissertation describes the annotation thought of this paper. The dissertation categories the search result content into three types:first, the content contain domain knowledge; second, the content do not contain domain knowledge; third, the mixed type:some content contain domain knowledge and others do not. For these three types, the dissertation uses two basic annotation methods: domain knowledge annotation and decision tree annotation to annotate three types separately and together. In order to eliminate the repeated data charging of the same Deep Web site to improve the efficiency, on the basis of the two basic annotation methods, the dissertation uses the model annotation method. The dissertation also uses two assistant annotation methods: entity annotation and heuristic rules annotation to work with other annotation methods. The dissertation makes a concrete analysis to related issues encountered in foundation and using of annotation models.The experiment indicates that the used methods have good abilities to deal with different situations, and the annotation results are good.
Keywords/Search Tags:Deep Web, Data annotation, Domain knowledge, Decision tree
PDF Full Text Request
Related items