Font Size: a A A

Research And Design On Coding Andsearching XML Data In Distributed System

Posted on:2016-12-25Degree:MasterType:Thesis
Country:ChinaCandidate:Q WeiFull Text:PDF
GTID:2308330479479783Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer science、Internet technology, as the carrier of human civilization which the information playes a "double-edged sword" role in the modern human life. On the one hand, due to the popularity of Internet, human beings can get all kinds of information anytime and anywhere; On the other hand, obtaining the meaningful information quickly and accurately from the vast ocean of information becomes very difficult. as One of the standards in information interaction, XML(eXtensible Markup Language) has played a widely used role in manay projects,such as database, Web technology, the Internet of things and so on. At present, there are three main kinds of mature scheme in XML data processing, Be placed in the native XML database(such as existdb), or getting the XML into relational database(such as SQL Server) in the form of binary, or useing XML directly as files in the file management system. But as the growth of the amount of XML data, and the upgrade and changes of XML data structure, the three schemes are unable to meet the requirements of large-scale efficient centralized management of XML data.Combining the popular cloud computing technology, this paper researchs and designs a encoding and retrieval on large-scale XML data. First of all, according to the characteristics of XML and HBase database storage structure, using graphs parallel coding, clever stores the overall structure and each node information of the XML on HBase. Then, translate some parts of XPath related common syntax, call HBase database API to perform parallel retrieval and data mining. Finally, using the sample data to verify the validity of the design and related query performance. One more thing, saving and querying in this article are built on Hadoop open-source organization, it can easily develop distributed applications, which run on inexpensive computer cluster, complete the calculation of mass data. Hadoop framework make sure that the entire system of high availability and efficiency, so that the focus of this article on algorithms and system architecture, rather than on the task scheduling and communication node distributed system.
Keywords/Search Tags:XML, Encoding, XPath, Hadoop, HBase
PDF Full Text Request
Related items