Font Size: a A A

Research On Schema Extraction From Deep Web Query Interface

Posted on:2012-05-15Degree:MasterType:Thesis
Country:ChinaCandidate:H F ZhuFull Text:PDF
GTID:2178330332999789Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of the World Wide Web, and now the Web's information is far beyond human imagination, vast amounts of Web information is often said to use. and for people the need to have access to effective, high-quality Web information is an urgent task in Web search.Web information through the outward in form of static and dynamic publishing, publishing an alarming number of pages, content and form of release due to the different and ever-changing business or personal. Although seemingly chaotic Web information, it can be divided into Surface Web and Deep Web based on whether the traditional search engines can index. Shallow Web is the hyperlink can be accessed through a static collection of pages that can be indexed by a traditional search engine. Deep Web mainly refers to the network database online, traditional search engines can not index the page collection. Compared with the Shallow Web. Deep Web has more information, and the information grows faster, as being more concerned about an area of Deep Web, so the information is more professional, so the Deep Web data mining and research is very meaningful.Online web database is hidden in the page behind the query interface, through the web site to submit a query to interface the user needs to get Web information. Since the Web database query interface is the entrance to obtain information, so the Mining of Deep Web query-interface mode information is the key step. Deep Web query interface, in the HTML form is given in the form of the code between the internal tags
and
. Presented to the user the area is a combination of related query. Deep Web is more focused on a field, so use the semantic relationship between domain ontology to guide the extraction of Deep Web query interface model. The schema extraction of Deep Web query interface is designed to get the attribute information of the query interface, accurate description of the query interface query capabilities. It can provide key information for the integration of the query interfaces.This paper makes research on query interface on internal code through the query interface and visual query interface model unit information, give the definition of control tag,text tag and layout tag in the internal code of the query interface and the definition of value input unit, text messaging unit and restricted selection unit in visual unit information of the query interface, and give the extraction framework of query interface model, this framework includes the following:First, the HTML form containing the query interface, regional location of query interface page use heuristic rules to filter out the non-form query interface to get the regional location of the query interface; second by valid data unit make resolution of the query interface, access to be query interface with the tag information; Then, based on observation and statistics of the internal coding rules and rules of the visual query interface unit information in the form finish the attribute extraction; Finally, under the guidance of domain ontology attribute information found in the form of semantic relations again for the form property portfolio to obtain query interface model in the semantic level.In this paper, from the visual information of query interface unit and the internal coding of interface, design the Deep Web query interface pattern extraction algorithm based on domain ontology, from the location, semantic tags, the string on the basis of similarity, a new query interface attributes representation, reflecting the semantic relationship between tags. Experimental results show this method is effective and feasible and has high precision.
Keywords/Search Tags:Deep Web, Query Interface, Domain Ontology, Schema Extraction
PDF Full Text Request
Related items