Font Size: a A A

Study On XML Engine

Posted on:2005-07-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:G L XiangFull Text:PDF
GTID:1118360122991367Subject:Library science
Abstract/Summary:PDF Full Text Request
XML has been accepted by every walk of life since it was brought forward by W3C in 1998. Many walks of life adopt XML as a description language for their document & information, such as MathML, CML, VoiceML. There are many XML-format documents because many walks of life produce them and how to manage these XML-format documents is a critical problem. The thesis focuses on the problem in time. The main works of the thesis includes:(1) XML Engine designing. The thesis designs an XML Engine and throws out the relation among XML Engine, XML database and XML application system. XML Engine contains three parts: storage subsystem, index subsystem and query subsystem. Storage subsystem serve as the storing system for index subsystem and query subsystem, in addition, it supply interfaces to XML application system. Index subsystem is responsible to index XML documents in storage subsystem and includes content index and structure index. Query subsystem' s function is querying and complies to XPath 1.0, moreover it has ability to query fulltext in XML documents.(2)The XML index technology. The thesis elucidates the content index & structure index for XML documents and gives the harmony combining method between the content index and structure index. The thesis solves three issues in content index: storage for length-varing record, the Chinses word index & phrase index, enhancing the speed of index construction. The thesis uses four index files to complete the content index & structure index: Chinese Character index, English string index, element index and attribute index. The thesis first gives the pre-post node labeling method, then puts forward the tree-adjacent table, transforms the DOM-tree into tree-adjacent table, in the last creates element index and attribute index from tree-adjacent table.(3)The XML query technology. The thesis gives the content query & structure query for XML documents, and elucidates the way of integrating the content query and structure query. The thesis simplely discusses three question about content query: simple query(also called matching), field query and Boolean query. The thesis gives five basic path query expression, namely simple regular path expression, order regular path expression, attribute regular path expression, value regular path expression and Kleen closure regular path expression. The thesis summarizes four operation for the five basic regular path expression: PC operation(Parent-Child), AD operation(Ancestor-descendant), CO operation(Containment), OR operation(Order).The thesis assumes these research methods: document investigation method, logical deduction method, generalization method and demonstration method. The thesis adopts different research methods for different research objects, and guarantees the reality and credibility for the research procedure and research result.There are 44 figures and 19 tables in the thesis.
Keywords/Search Tags:XML, index technology, query technology, structure index, structure query, engine
PDF Full Text Request
Related items