Font Size: a A A

Cross-Platform Distributed Streaming Data Migration System For Big Data

Posted on:2021-03-17Degree:MasterType:Thesis
Country:ChinaCandidate:L LuFull Text:PDF
GTID:2428330647451053Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the explosive growth of data volume,people has entered the era of big data.In the era of big data,multiple computing models(basic statistical analysis,machine learning etc.)are proposed thanks to the diversity of data processing problems.At the same time,a complex industry big data analysis application in the real world needs to mix multiple computing models.However,there is no big data computing platform that can well support all computing models.For these reasons,cross-platform data processing has been widely studied.At the same time,cross-platform data migration has become a problem that must be solved.On one hand,in order to efficiently migrate data between platforms,the data migration system needs to support a unified data exchange format and streaming data migration.On the other hand,in order to efficiently support distributed parallel computing of big data,the data migration system should also adopt a distributed architecture design,and support load balancing and data fault tolerance.However,the existing data migration work cannot fully meet these requirements.For the background and requirements discussed above,this paper studied and designed a cross-platform distributed streaming data migration system for big data called Crossroad.Crossroad can meet the above data migration requirements.The main work and contribution of this paper are as follows:(1)This paper studied and analyzed the impact of data exchange format on cross-platform data migration performance,and reduced the cost of data format conversion during cross-platform migration by selecting an efficient unified data exchange format.(2)This paper proposed a data migration method based on file queue.This method not only allows data to be exported and imported simultaneously but alsosaves data to the file system for data fault tolerance.(3)After studying the problems of streaming data migration in distributed scenarios,this paper proposed effective solutions including the routing mechanism to transfer data,the data shuffle strategy for load balancing,and the Batch selection strategy for efficiency.(4)Based on the above key technologies,this paper designed and implemented an efficient cross-platform data migration prototype system called Crossroad.Crossroad can support the efficient migration of data between different big data computing platform,which is important to cross-platform data processing.Experimental results demonstrate that the proposed technologies and system can effectively support data migration in big data scenarios and have excellent performance.
Keywords/Search Tags:Distributed computing system, Cross-platform data processing, Data exchange format, Streaming data migration
PDF Full Text Request
Related items