Font Size: a A A

Data Processing Of Complex Structured Data Based On MapReduce

Posted on:2011-03-05Degree:MasterType:Thesis
Country:ChinaCandidate:Q MaFull Text:PDF
GTID:2178360305497306Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The explosion of data has been plaguing the computer industry during recent years. Since the emergency of World Wide Web, especially the rapid development of Internet in China, various types of data have emerged. Nowadays, the traditional centralized database has been widely used in most applications. However, a few companies have established large data centers based on the evolution of cloud struc-ture to replace the traditional centralized data management technology. Therefore, how to store, index and organize vast amounts of data has become a very urgent problem.Massive data management problems have caused much attention around the world. With the increasing number of applications, more and more different data structures have emerged. Although many studies based on massive data exist, it is too difficult to use a common framework to resolve processing of complex structured data. Facing many different characteristics of data in the real world, we urgently need appropriate technologies to support applications, especially technologies of complex structured data processing.Summary, we study complex structured data processing base on MapReduce. Forum graph data and trajectory data of moving objects are the most representative complex structured data. We will introduce how to use a generic model which can directly support complex structured data based on MapReduce in this paper.·Analyze the urgency and feasibility of complex structured data processing based on MapReduce. MapReduce framework is suitable for off-line process-ing of massive data, but the input data is read as character stream. Simple character stream cannot express the complexity of data structure, so tradi-tional MapReduce framework cannot be directly used for data processing of complex structured data. However, in real world, more and more applications generate complex data structured data, it is urgent to support these applica-tions.·Achieve analytic queries for forum graph data based on MapReduce. Forum graph data is one of the most typical complex structured data. Most existing data analysis techniques only focus on page contents, and ignore the structural information. As part of CWI(Chinese Web Infrastructure), this paper presents a new massive data analytic query tools for both contents and structures. The paper implements the system in a distributed environment, further realizes four basic operators of TLGM-QL, finally experiments show their good balance and scalability.·Realize query processing of massive trajectory data based on MapReduce. Trajectory data of moving objects is another one of the most typical complex structural data, and one of the most typical sequential data. How to save characteristics of sequential data in distributed system is a big problem. This paper proposes a general trajectory data query processing technology, and adopts a MapReduce-like framework to balance clusters, finally experiments show their good balance and scalability.
Keywords/Search Tags:Distributed Computing, Massive Data, MapReduce, Complex structure
PDF Full Text Request
Related items