Font Size: a A A

On theory and applications of reuse of multiple extensible markup languages (XMLs)

Posted on:2006-07-09Degree:Ph.DType:Thesis
University:University of Southern CaliforniaCandidate:Chen, Yih-FengFull Text:PDF
GTID:2458390008971355Subject:Computer Science
Abstract/Summary:
The eXtensible Markup Language (XML) has been widely utilized in various domains such as multimedia applications and databases due to its flexibility and the self-describing capability. The number of XML-based markup languages grows rapidly in recent years. There exist redundancies and conflicts among a large amount of XML applications that have been designed for similar or identical purposes. A solution to this problem is to make existing XML schemas reusable by decomposing them into meaningful and properly-scaled subschemas according to their syntactic and semantic information. New XML schemas can be constructed from subschemas in the repository. How to extract XML subschemas for reuse and how to integrate subschemas are investigated in detail.;The task of integration of multiple XML subschemas, including their operations on schemas and instances, is called XML harmonization in this work. The axiom-based and object-oriented XML harmonization methodologies provide us two approaches to reuse existing XML schemas. The axiom-based methodology is applied to XML instances that have regular partial structures. Users interact with XML files stored in the XML repository by the provided primitives. The object-oriented harmonization methodology is applied to non-data-centric application domains. We apply the approach to multimedia domain as an illustrative example.;A systematic approach to the construction and organization of a repository of reusable XML subschemas is also proposed in this thesis. It consists of two main processes: schema processing and repository construction. All elements are candidates of the root of reusable subschemas. We use two weighting schemes to quantify the information of an element based on the structure and the descendents of an element. Then, they are partitioned using the K-means clustering algorithm to provide different resolutions of the repository. Subschemas rooted at the element of greater weights are chosen as reusable ones, which are located in the L highest groups. We use an ( N + 1)-tuple to represent a subschema for better and efficient storage. Tuples of subschemas are further used to remove redundancy in the repository. When the similarity measure is above a threshold, we eliminate the one with less information.
Keywords/Search Tags:XML, Applications, Markup, Repository, Reuse
Related items