Font Size: a A A

Research On Parallel Downloading Technology In Distributed Storage System

Posted on:2012-12-30Degree:MasterType:Thesis
Country:ChinaCandidate:Q X WuFull Text:PDF
GTID:2178330338492010Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Nowadays, with the acceleration of the process of informationization,a variety of datasets that need be stored are increasing. The storage system is facing revolutionary changes,and concrete manifestation is the rapid expansion of storage capacity. The rapid increase of storage capacity brings a great challenge to the design of storage system. The past single-point centralized storage based on server/client can not meet current storage needs. Their counterparts, distributed storage systems have developed quickly. In distributed storage system, datasets usually are stored with many replicas,and the aim is to improve the system reliability and availability of data. Because the dataset has multiple copies in the system,how to quickly get the dataset from the storage system has become the focus of academic research. At present, two methods can be used to obtain the required data quickly .One is to use a server selection algorithm to select the best server to download dataset. The other is the use of multi-node collaborative parallel downloading technology. As multi-node collaboration parallel downloading technology can take full advantage of bandwidth of servers and avoid complex server selection algorithm, the storage system by this method has obvious advantages. In this paper, the relevant technology of multi-node collaborative parallel downloading in the storage system has been studied. This work and innovation of this paper can be summarized as follows:1. A distributed storage architecture is proposed and related technologies of infrastructure are studied and the relevant background of parallel algorithm used in the system is analyzed.2. The mechanism of network bandwidth and latency has been analyzed detailedly and network measurement tools are used to analyze the impact of network delay and packet loss for bandwidth .The impact of TCP flow control and congestion control for bandwidth is analyzed in ns2 simulation platform.3. The mechanisms of common parallel sockets are analyzed detailedly. The multi-streaming and multi-homing of SCTP are analyzed and an improved FTP based SCTP is proposed. SCTP-based FTP and TCP-based FTP have been compared and SCTP-based FTP shows the superiority of downloading speed. Besides, the disadvantages of Poll for Psock have been analyzed and the advantages of using Epoll instead of Poll are also analyzed.4. A multi-node cooperative parallel downloading algorithm based on bandwidth measurement is proposed. The final block of the file that wants to be downloaded is adjusted dynamically and re-assigned to all servers. The aim of the algorithm is to make multiple concurrent downloading steams end almost at the same time. In addition, in order to reduce the server side to read the hard drive and accelerate the download speed, server-side caching is used in the parallel downloading algorithm.
Keywords/Search Tags:distributed storage, bandwidth and delay measurement, parallel sockets, multi-node cooperation, parallel downloading
PDF Full Text Request
Related items