Font Size: a A A

Massive Data Processing In Complex Application Scenarios

Posted on:2016-08-13Degree:MasterType:Thesis
Country:ChinaCandidate:Z DongFull Text:PDF
GTID:2308330461487386Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer technology and network technology, more and more information produced in human activities is digitalized. Not only the volume of data grows dramatically, but also the data sources present heterogeneous characteristics. On the other hand, the value of the data is emphasized, and people expect to discover useful information and patterns from vast amounts of diverse data. Therefore, how to process massive data efficiently becomes a hot research in recent years, appealing a lot of attention from both the academic and the industry.In general, there exist two kinds of pattern in massive data processing scenario, namely offline processing and online processing. In offline processing,the data has been stored, therefore it is static and historical, and the throughout is addressed. In online processing, the data flow in continuously, therefore it is dynamic, and the real-time is addressed. In recent years, both offline processing and online processing have been researched widely, and many excellent theory and products emerge.In this paper, we focus on a general class of application scenario in which both offline processing and online processing of massive data are needed. There are two main worksin this paper:1) We propose a distributed system architecture which could be adopted in the application scenario mentioned above. Firstly, the architecture could support the access of high-speed incoming data from multiple data sources efficiently;Secondly, the architecture could offer consistent data for subsequentoffline processing module and online processing module in a scalable way;Thirdly, the architecture could support aggregating the results of offline processing and online processing at the application layer. We give theoretical analysis of the rationality of the architecture, and we prove the effectiveness of the architecture through experiment and application.2) We propose a decentralized method to support the task assignment in distributed environment.There doesn’t exist a central node in our method, and all involved nodes run the assignment algorithm independently which could avoid system failure when central node encounters exceptions in master-slave based method. We also discuss the effectiveness of the method in theory, and we give comparative analysis of our method and centralized method.The architecture and method proposed in this paper have been adopted in real production environment for a long period of time, demonstrating the practical value of them.
Keywords/Search Tags:massive data processing, architecture, decentralized, task assignment
PDF Full Text Request
Related items