Font Size: a A A

Techniques Of XML Dynamic Labeling Schema And Distributed Management

Posted on:2013-01-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y C FanFull Text:PDF
GTID:2218330374467228Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As the development of Internet technology, the amount of unstructured data increases day by day, and research on unstructured data management has great theoretic significance and practical values. XML data, which is the primary techniques in unstructured data, has been wildly used and how to manage XML data efficiently has become a challenging and popular research problem in database area. Meanwhile, cloud computing is a new trend and how to meet the needs of massive data computing in cloud environment is mainly discussed in current computer area. With the developments of XML application and large scale distributed computing technology, XML data management in distributed system has become a new research direction. As to XML data management techniques, this paper has given detail illustrations and main contributions focusing on the XML dynamic labeling schema and the XML distributed storage and access management.In the domain of XML labeling schema, most efficient index and query techniques over XML (extensible markup language) data are based on a certain labeling scheme, which can help quickly determine ancestor-descendant and parent-child relationships between any two nodes. The current basic labeling schemes such as containment scheme and prefix scheme cannot avoid re-labeling when updating XML documents. After analyzing the essence of existing dynamic XML labels such as compact dynamic binary string (CDBS) and vector encoding, this paper gives a common unifying framework for the numeric-based generalized dynamic label, which can be implemented into a variety of dynamic labels according to the different user-defined value comparison methods. This paper also proposes a novel dynamic labeling scheme called radical sign label. Extensive experiments show that the radical sign label performs well for the initialization, insertion and query operations, and especially for skewed insertion the storage cost of the radical sign label is better than that of former methods.In the domain of XML storage and access management, the traditional centralized and small scale distributed data processing technology is difficult to satisfy the need of massive data management. Hadoop Distributed File System (HDFS) provides a useful platform for massive data management because of its scalability high availability and fault tolerance. Based on the platform of Hadoop distributed file system and the programming framework of MapReduce, this paper has designed and implemented a massive XML data storage and access management system in distributed environment, which can show the functions of distributed storage for massive XML data and the quick access for specified XML data, and experiments have proved its availability and effectiveness using massive XML data sets representing audio feature data. In the key techniques of this system, firstly, we design the logic data structure of XML tree to represent the audio feature data. Secondly, we build index for XML data using GrayCode, and design the structure for XML distributed storage on HDFS platform. Thirdly, Memcached are introduced as the distributed cache in order to access data efficiently. The HDFS and MapReduce based massive XML data distributed storage and access system is not limited to audio feature data management, all other data can also be represented using XML tree model for distributed management in this system.
Keywords/Search Tags:XML, Dynamic Labeling Scheme, CDBS, Vector Encoding, RadicalSign, Distributed system, HDFS, MapReduce, GrayCode, Memcached
PDF Full Text Request
Related items