Font Size: a A A

Research On The Storing And Querying Of Semi-Structured Data On The Web

Posted on:2006-07-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:J QinFull Text:PDF
GTID:1118360185963425Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The Internet contains huge amounts of information. How to quickly and efficiently find the required information remains to be an open problem for network applications. Quite a lot information in the Internet is based on Web pages. The most remarkable character of Web data is that they are semi-structured. Semi-structured data have no unified structural characteristics, and there is correlation between the data and the structure. With the growth of Web pages, it becomes even harder to find the needed information from the Web just by character string matching. So semantic-based querying becomes an important means to obtain Web information. The traditional Web description language HTML expresses only the data content rather than the Web data structure, so it does not lend itself to obtaining Web information by semantic-based querying. XML is a new kind of semi-structured data description language. As a Web data description language, it overcomes the disadvantage of HTML. XML is taking the places of HTML, and is becoming a new generation standard for Web data describing and exchanging.The appearance of huge XML data demands the management of them. Since the Web data described by XML is semi-structured, the existing techniques on structured or nonstructured data can not do very well on semi-structured data, and the traditional data management techniques (such as relation databases, object oriented databases and etc.) can not adapt XML data management very well. Now the management problem of XML-based semi-structured data becomes a hot topic in the field of data management.This dissertation focuses on the following four aspects to solve the problem of storing and querying XML based Web data:1. Web data modelWeb data model is the foundation of Web data management. Existing Web data models have not taken into account the heterogeneousness of Web data, and have not taken into account Web data nesting which induce to the querying loops. Aiming at these two problems, a new Web data model is presented, namely XML based Web data model(XWDM).. By extending some definitions in XQuery 1.0 and XPath 2.0 data model, XWDM gives a solution to the problem of name heterogeneousness which is resulted from the name difference of the same data element in different Web pages, and the problem of infinite circulation which is common in Web data querying.2. The problem of XML-based Web data storageThe fact that web data described in XML has no unified schema cumbers Web data querying. Based on the findings, a new storage model for web data based on relation database is presented, namely XPED. XPED is a new kind of storage model which is based on model mapping. It transforms XML documents which have no schema information to three relation tables which can be stored in relation database. By this means, the querying of Web information can be transfered to data querying in relation databases. The mapping method presented...
Keywords/Search Tags:XML, semi-structured data, schema, regular path expression, data model, storage model, model mapping, structure index, path join, number schema
PDF Full Text Request
Related items