Font Size: a A A

Research On DeepWeb Query Interface Of Pattern Matching

Posted on:2016-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:G D HeFull Text:PDF
GTID:2298330467495903Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet and Computer Science Technology, thetwo ways of static and dynamic that web information is released to the server,Webpage numbers is very terrible that Webpage is released in network database,however, Webpage exists can be divided into two kind of surface and deep web on thenetwork server of, Surface Web is access to a static link others, in most case, searchengines can crawl Webpage content on Surface Web page, and compared withSurface Web, the Deep Web exist to some of the network database server, then, forthese network database, and Deep Web’s content is based on query condition ofdynamic variable from user on the Webpage, traditional search engines cannot becrawl the hidden information on the deep web, the value of the data information cancrawl in the Deep Web.At present, the most main source of people access to information is the Deep Web,and Webpage information content in the form of structured is stored and distributedon the network database. Deep Web contains a large amount of data information, andvery high quality data information hidden in the Deep Web, how to quickly andeffectively obtain high quality data information from the Deep Web, search enginecannot obtain to hide data in the deep web, the research of Deep Web query resultsintended to automatic extraction of the data from the Deep Weep database, it canaccurately and fast obtain knowledge.The query condition reflect user inputting variety of combination in the form,because Deep Web only focus on a certain field, therefore, this paper will introducethe semantic relationship of WordNet as a Deep Web query interface form patternmatching. This paper will put forward a new method of Deep Web query interfacepattern match, the framework of this paper includes four aspects:1) To locate and identify the query interface form in the Webpage document, andthrough a heuristic rules to exclude non query interface form, getting the formlocation list of existing query interface in the Webpage document;2) Analysis and remove the properties effectively of query interface form and parse operation, get the text markup and controller markup information of query interface;3) the query interface of form automatic extraction form properties based oninternal HTML encoding rules and Webpage visual unit rules;4) WordNet will guidance information extraction form properties of varioussemantic relations, and location and identification the query interface of form, then,according to the semantic relation recombination the relationship of form properties,and obtaining form matching model of the semantic relations.This paper will study and design a matching method of Deep Web query interfacebased on WordNet semantic relations, and identify and localization the form based onsemantic similarity of text labels, tags and control markup, this paper presents a newmethod of query interface property, and realize to calculate the semantic similaritybetween the attribute tag and control tag for query interface matching model,experimental results show that, the actual application of this algorithm is effective andfeasible.
Keywords/Search Tags:DeepWeb, query interface, pattern matching, WordNet
PDF Full Text Request
Related items