Font Size: a A A

Discovery Of Query Interfaces And Extraction Of Metadata Information On The Domain-Oriented Deep Web

Posted on:2017-04-10Degree:MasterType:Thesis
Country:ChinaCandidate:J XiangFull Text:PDF
GTID:2308330503467216Subject:Software engineering
Abstract/Summary:PDF Full Text Request
On the Internet, because a great deal of information is hidden in the depths,the ordinary search engines can not return them directly. But most of information that the ordinary search engines can not search is important. These resources stored in the Internet database, which we can not access through hyperlinks is called Deep Web data. To obtain deep web information, it is necessary to submit the query by the query interface, and establish a Deep Web information integration system. The discovery of Query interfaces, classification and construction of meta database is the primary task. However, the deep web information stored in different databases and changes dynamically,and the corresponding query interfaces are also changing frequently.Thus it is difficult to get the Deep Web information. As the basis of the integrated system, how to discover the Deep Web query interfaces and extract metadata information correctly, effectively are particularly important.For the above major issues, the main contents of this paper are:(1) Studying the discovery of query interfaces about Deep Web. This paper presents a method based on rules to obtain the web page which contains the query interface of the corresponding source by crawling the relevant URL. Then locate the position of the query interface in the web page, and extract information of the interface information to store.(2) Studying the extraction of metadata information about Deep Web. This paper mainly adopts the method based on visual features and user-defined rules, to get the attributes information of the source query interfaces, and store them in the meta information library.(3) Maintaining the source information. The meta data is saved in a tabular form. It facilitates the Deep Web integration system and processing the results.(4) Solving the problem of update efficiency for multiple information source meta data through multi-thread technology.At last, the results of extraction of the Deep Web source metadata information are verified by experiments. The experimental results show that the discovery, extraction methods are feasible and have good performances, and the results are applicable to the Deep Web system integration and the query result processing. At the same time, the meta-information query interface management module has some extensibility, and gives a good foundation for the designof Deep Web Integration System.
Keywords/Search Tags:Deep Web, Discovery of query interfaces, Extraction of Metadata Information, Integration
PDF Full Text Request
Related items