Font Size: a A A

Based On The Hadoop File Synchronization Storage System Design And Implementation

Posted on:2013-02-11Degree:MasterType:Thesis
Country:ChinaCandidate:W J LiuFull Text:PDF
GTID:2248330374985983Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
At the Cloud computing times, the technology of data storage and backup hasbeen closely related to the lives of individuals and organizations. They are facing theproblem of the management of massive data. The development of cloud storage and itsrelated technologies have brought an innovation to the field of data storage. Onlinestorage system which uses cloud storage technology can provide a permanent,convenient, inexpensive storage service with scalable storage space. Currently, themature products in china are Jinshan fast disk、 Huawei network disk and so on. Theyprovide stable functions, for example, the functions for data storage, filesynchronization. But they also have some functional problems. Firstly, the filemonitoring function in the client of these system are not perfect; Secondly, the filesynchronization efficiency is relatively low in some cases; moreover, some servicesdon’t provide the functions of secure data transmission and diverse solutions of networktransmission according to different file synchronization events; lastly, datacryptographic storage in client and server-side isn’t given attention. The optimization ofdata storage platform which is based on cloud storage should be considered by theonline data storage service providers, because the platform is the basis of the service formassive amount of data.This article summarizes some problems of the current online file synchronizationand storage service from the point of view of users. In order to solve these problems,this text makes a study of the key technology in realizing the file synchronization andstorage service, designs and realizes the file synchronization and storage system whichare based on Rsync synchronization algorithm and the HDFS distributed file system inHadoop. The main work of this paper include: analysising of the advantages anddisadvantages of domestic and foreign products, finding out the requirements of users,monitoring the changes of files in virtual disk in client by using open source jpathwatchlibrary in real time, implementing the function of different types of synchronous eventstriggers and notification, improving the library functions to monitor files rename andmoving; processing events with different characteristics with different methods; especially for the content of file update event and file resume event, designing asynchronization protocol to reduce data transmission in the course of filesynchronization on the basis of rsync algorithm, improving Synchronization efficiency;designing the best transmission methods for different synchronization events, usingHTTPS protocol to implement the data encryption transmission; build a Hadoop-baseddata storage platform.In this paper, layered modular thinking is used to design and implement thesystem, the functional modules of the system are tested and analyzed and the researchresults and system optimization features are summarized in the last paper chapters, thefurther work are planed in the last.
Keywords/Search Tags:Hadoop, file synchronization, file system monitorin, Rsync algorithm
PDF Full Text Request
Related items