Research On Expression And Extraction Of Web Database’s Characteirstics

Posted on:2013-11-11

Degree:Master

Type:Thesis

Country:China

Candidate:L Zhao

Full Text:PDF

GTID:2248330395459957

Subject:Management Science and Engineering

Abstract/Summary:

PDF Full Text Request

As the development of Internet, Web is speeding up to deepen. The web can bedivided into Surface Web and Deep Web. The former is pages set which can be searched bythe traditional search engine. The latter generally refers to the accessible online-databases.The amount of information in Deep Web is more than Surface Web. Deep Web takesadvantage over Surface Web in quantity and quality of information. Deep Web has becomeone of the main means to get information. Because a lot of information is locked in thedatabase, many of the pages are generated dynamically by response to specific queries, soretrieving the WDB will not only greatly expand the search capabilities, but also provide aconvenient means to find the information easily.WDB query interfaces are the only path to access the web databases (WDBs), eachquery interface corresponds to a different query mode. To find the right information, userscan fill in them and submit requests. But now, with the development of a variety ofscripting technologies such as JavaScript, Ajax, and dynamic web technology, thecomplexity of query interface is also increasing, and data in WDB is various. So in order toaccess WDB automatically and improve the search capabilities, we need to quickly identifythe characters of such dynamic query interface, find the constraint relations within thevarious elements, give the quantitative description of WDB data and extract them.To solve the above problems, we launch a study. This paper mainly studiesrepresentation of WDB characters, Web Database sampling, and extraction method ofWDB characters. So the specific studies include:(1) Expression method of Characteristics of WDB query interface and WDB dataIn this paper, the attributes of WDB data are divided into three categories: textattribute, digital attribute and catalog attribute. For text attribute, we use word frequency torepresent the characteristic. For digital attribute, because the digital attribute has thecharacteristics of continuity, and the normal distribution has strong universality, we use theexpectations and biases to express the characteristic. We use statistical method to express the characteristic of catalog attribute. After obtaining the characteristics of each types, wecan form the final feature vector. Finally, because ontology has a good knowledgerepresentation and reasoning ability, this study uses ontology to represent query interface.(2) Web database sampling based on Probability&Statistics ModelIn order to realize the extraction for the characteristics of WDB, this paper provides amethod to sample Web database based on Probability&Statistics Model. There are fivekey steps for sampling WDB:①Construct initial query Q and characteristics vector;②Execute query Q and get query results from WDB;③Add the result to sample set S andanalyze the query results, calculate the probability and conditional probability of variouscharacters to prepare for next query;④Judge if the loop should be broken;⑤Construct thenext query. According to experiment, the sampling method is reasonable and effective.(3) Extraction method of Characteristics of WDB query interface and WDB dataBased on the above research, this paper presents extraction methods of WDB queryinterface and WDB data’s characteristics. Firstly, this paper presents extraction methodswhich are more compatible for query interface-the extraction method based on regularexpression for form information and the extraction method based on Watir and Ajax forrelationships, the methods can do very well to extract the context information, attributeinformation and relationship information of the query interface. Secondly, in order toachieve the extraction of WDB data’s characteristic, we also give three methods. For textdata, we use word frequency to achieve extraction; for digital data, we use the normaldistribution; for catalog data, we use the ratio of the number of records.

Keywords/Search Tags:

WDB, Query Interface, Characteristic Expression, CharacteristicExtraction, Database Sampling

PDF Full Text Request

Related items

1	Based On The Epp Parameters Of The Field Emission Measurement System
2	Natural Language Interface Of Database And The Application Of The Audit
3	An investigation of the end user database query process
4	Object - Relational Database Orbase Inquiries And Optimize Processing
5	Research On Query Interface Technologies To Relational Databases
6	Research Of Database Query System Based On Natural Language Interface
7	Research On A Generalized Multimedia Database Model Based On MPEG-7
8	Research And Implementation Of Sampling-Based Aggregation Query System On Big Data
9	Well-definedness, semantic type-checking, and type inference for database query languages
10	A Store And Query System Design Of XML Based On Relation Database