Font Size: a A A

Algorithms for management of document-centric XML data

Posted on:2006-03-07Degree:Ph.DType:Thesis
University:University of KentuckyCandidate:Iacob, Ionut EmilFull Text:PDF
GTID:2458390005995355Subject:Computer Science
Abstract/Summary:
XML, initially designed for large scale text publishing, has rapidly evolved as a standard for a wide variety of data exchange and representation applications. With an increased volume of data, XML data management has been the subject of intensive research. Database research groups have concentrated on building database management frameworks around semistructured data represented as XML. At the same time, humanities research groups have concentrated on development of application specific XML-compliant markup languages, and application of XML to encoding a wide array of documents.; Two major kinds of XML documents emerge from applications: data-centric and document-centric. Data-centric documents are characterized by a fairly regular structure and occur as a standard format for data exchange and representation of semistructured data. Document-centric XML has, in general, a much more irregular structure and is often encountered as the means of document markup. In recent years, a number of applications of XML to document-centric encoding have led to markup that could not be stored in a hierarchical XML document (the concurrent markup hierarchies problem). This is mainly a consequence of the multi-hierarchical nature of text documents: the physical location hierarchy (document pages and lines), the text structure hierarchy (paragraphs, sentences, and words), etc. A prominent example of document-centric XML with multiple hierarchies is the XML encoding of manuscript folio images: the heterogeneous information to be encoded (from text and images) is very rarely hierarchical.; The problem of concurrent markup hierarchies in document-centric XML encodings has attracted attention of a number of humanities researchers in recent years. Previously proposed solutions to this problem rely on the XML expertise of humans and their ability to maintain correct schemas for complex markup languages. This thesis introduces a framework that allows the humans to concentrate on the semantic aspects of the encoding, while leaving the burden of maintaining XML documents to the software. We formally define the notion of concurrent markup hierarchies and concurrent XML documents and we give algorithms for document-centric XML data management, with a special focus on document-centric XML documents with concurrent markup.
Keywords/Search Tags:Document-centric XML, XML documents, Concurrent markup, Research groups have concentrated
Related items