Font Size: a A A

Study On Schema Recognition Oriented To Response Page Of Deep Web

Posted on:2009-05-25Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhangFull Text:PDF
GTID:2178360308979749Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As the development of information technologies, information on Web is growing rapidly. According to the depth of the information, the Web can be divided into two categories: Surface Web and Deep Web, where the later one refers to data sources that are stored.in databases and can not be accessed by hyper-links but only by dynamic web page accessing. Some statistics have shown that information on Deep Web and its accessing amount as well as the increasing speed is far higher than Surface Web. Thus, as the increase of Web databases, accessing Deep Web for information gradually becomes the main method to acquire information, for which automatic acquiring Deep Web data sources for large scale integration is even the more important.At present, the main method to acquire Web information is search engine. However, traditional search engine can only access Surface Web information, and is incapable of indexing dynamic data sources Deep Web. Thus, search engine in support of Deep Web is widely demanded. But for the features of Deep Web, it is very difficult to achieve this requirement in technique.Practically, the paper analyzes the query interface of Deep Web and its response pages, proposes a search engine architecture based on Deep Web, describes its design and elaborates two extracting pattern algorithms in pre-processing sub-system, namely query based and input interface based pattern extracting algorithm.The experiments have shown that the methods we proposed have good recognizing capability under different situations. It can solve the input interface recognizing problem well with these two approach combined, and it gives theoretical support for search engine on Deep Web as well.
Keywords/Search Tags:Deep Web, Schema extract, Schema match, Page analysis, Search engine
PDF Full Text Request
Related items