Font Size: a A A

Research On The Problem Of Data Consistency Of Distributed File System In The Cloud Computing Environment

Posted on:2015-01-15Degree:MasterType:Thesis
Country:ChinaCandidate:L F QiaoFull Text:PDF
GTID:2308330473953237Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of enterprise information and mobile internet,the traditional computing model and storage model can not meet the growing business needs,cloud computing is a computing model proposed in this context,it is a business development of distributed computing,parallel computing and grid computing,it provides a flexible expansion of service. Cloud storage is an important component of cloud computing service architecture,which provides a scalable,highly fault-tolerant storage service.Distributed File System provides a cloud storage system support,its performance has a direct impact on the cloud storage service capabilities.In order to improve the reliability and performance of the system, replication and caching techniques are used in distributed file system,however,replication and caching may result data consistency problems.We can refer to few consistency model which based on different angle to solove the data consistency problem. HDFS is a distributed file system facing a large data sets, high throughput applications,the data storage node which provides a mechanism for redundant storage of data blocks to ensure system scalability and reliability,but it uses a copy of the data block which is based on the flow of the pipeline update mechanism, when faced with an interactive application scenarios,and the strong consistency policy it use will cause a sharp decline in service performance.This thesis start with analyzing the characteristics of individual users based cloud storage service and using habit of cloud storage service,studying on the technical architecture of HDFS,designing and implementing a HDFS cloud-based storage system,using a configurable data consistency strategy to improve system availability and using the client-side caching scheme and merging small files storing and accessing to improve overall system performance.The main work of this thesis include: analyzing and researching data synchronization model can be used to resolve the consistency problem in distributed system;researching HDFS data synchronization model used,and pointed out its shortcomings when providing personal-oriented cloud storage service,proposed and implemented a solution based NWR which can be configured to reading and writing synchronization model;since HDFS does not provide the client caching ability,in order to enhance system scalability and reduce the accessing pressure on the service side,adding a cache module on native HDFS client interfaces;for the case when faced with a large number of small HDFS file accessing memory bottleneck in Namenode may lead to poor performance,designing and implementing a small file solution that can greatly improve the efficiency of storage and access. Finally,testing and analysis shows the improvement of availability and service performance of the system.
Keywords/Search Tags:Data Consistency, HDFS, Cache System, Small File Storing, NWR Model
PDF Full Text Request
Related items