Font Size: a A A

Research And Optimization Of Data High Availability Deployment Strategy Based On Alluxio

Posted on:2019-11-21Degree:MasterType:Thesis
Country:ChinaCandidate:X J XuFull Text:PDF
GTID:2428330566496857Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the further development of the distributed file system which stores massive amounts of data,a concept of the memory distributed file system has been born in order to meet the needs of various organizations and academic organizations for data access speeds.With the gradual development of the memory distributed file system represented by the open source Alluxio,this concept has gradually been put into practice.Although it has been applied to all aspects of daily life,it also exposed many problems.This article takes the availability of the massive data system as the research objective,compares and analyzes the mechanisms for ensuring data reliability on other systems,combines the characteristics of Alluxio,and proposes improvements on the data deployment strategy to improve the availability of Alluxio itself and ensure better support for upper-level computing frameworks.The Alluxio system,as the middle layer of the big data ecosystem,links the upper computing framework with the underlying storage system.When it is supporting the upper layer computing framework to read data,if the underlying storage system or its communication network fails,the reliability of the data in Alluxio will not be guaranteed from the underlying storage,which is undoubtedly very dangerous.At the same time,taking into account the huge consumption of remote calls to maintain data consistency,it is necessary to establish or improve the internal data protection mechanism of Alluxio.This article proposes an idea to improve the Alluxio.It is mainly divided into two points.One is to divide the data according to its heat,and fix the hot data to the memory by the Alluxio hierarchical storage manner to improve the response efficiency and the overall execution efficiency of the system;secondly,data blocks are used as the data granularity,by setting appropriate replica coefficients in conjunction with Alluxio parallel access measures,further improves the access efficiency of hot data,reduces the storage space occupied by upset data,and also ensures the availability of the system through redundancy measures.Ideally,most of the system's internal storage is hot data,supplemented by other common data with high frequency of access,so that when Alluxio encounters problems such as loss of underlyingstorage and failure of its own node,it can still maintain the service to the upper computing framework,so that the system can be insisted until the fault has been fixed.Based on the above-mentioned optimization ideas,after interpreting the source code of Alluxio,this article externally established a set of calculation modules classified according to the read prediction frequency of data block and a dynamic adjustment module for managing the number of copies;it internally rewrites its own data deployment strategy,adds data layered persistence content,monitors abnormalities,and implements a fault handling module.Finally,this article tests the completed strategy with an access algorithm that mimics the actual data access distribution.After comparing and analyzing with other strategies,this article successfully verifies the effectiveness of this strategy in improving the system response efficiency and reducing the system load;After the failure simulation,through the analysis of the execution of the task,the effectiveness of the strategy of this article on improving the usability of the system has also been successfully verified.
Keywords/Search Tags:Alluxio, data deployment, hot data, dynamic replica, data availability
PDF Full Text Request
Related items