Font Size: a A A

On The Hadoop Based Distributed Storage Techniques And Its Applications In Content Dissemination Design

Posted on:2016-06-27Degree:MasterType:Thesis
Country:ChinaCandidate:T L ChenFull Text:PDF
GTID:2308330461969406Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the rapid development of computer technology, the growth of the network and the popularity of smart devices, information processing as well as network service has significant impact on every people. Especially in recent years, the development of network technologies, such as P2P, social networks, mobile Internet, Internet of Things, e-commerce and multimedia sharing, does not only provide great convenience to people at the same time, but also brings the explosive growth in global data. Jim Gray, a winner of ACM Turing Award, put forward a new empirical law for worldwide data growth:the amount of data produced in every 18 month equals to the sum of that in past! How to store, manage and use such a large amount of data has become a pressing problem. It is very important and practically significant to investigate the related problem.Nowadays, all kinds of industry applications demand high quality storage system. Distributed storage system has attracted much attention due to its low cost as well as high scalability, and it has become the primary choice for mass data storage. However, due to the availability of nodes in distributed storage system is not high, in order to guarantee the reliability of data, system will repair failure nodes frequently. This thesis builds a Hadoop test cluster based on OpenStack with using replication, XOR, RS and SR four storage policies. Through theoretical analysis, we compare the storage cost, repair network cost and single file reliability of four storage policies. Then the actual performances of four policies are obtained via tests on Hadoop cluster, and the actual performances are compared with the theoretical ones. At last, according to the test result, the characteristics of each storage policy and comprehensive analysis, different application scenarios for four storage polices are given.Distributed storage system has the characteristic of "write once, read multiple times". When the user is reading a file from the distributed file system, he needs to download the data blocks from different nodes to reconstruct the original file. Especially for the popular files, when users read the file at peak time, the characteristics of "read multiple times" can cause network congestion, not only affects the availability of files, may also affect the reliability of files. In this thesis, we combine the distributed storage technology with the thought of CDN, through the research on content dissemination strategy based on distributed storage system the problem of communication is converted to a distributed storage problem. To alleviate the peak time network congestion by increasing storage overhead under the framework of distributed storage system. At last, we take the experiment on Hadoop cluster as an example, verifies the feasibility of data dissemination strategy.
Keywords/Search Tags:Distributed storage system, Hadoop, Regenerating Code, Data delivery strategy
PDF Full Text Request
Related items