Font Size: a A A

Research Of Massive Data Storage And Management In Cluster Environment

Posted on:2011-09-11Degree:MasterType:Thesis
Country:ChinaCandidate:X Q HuFull Text:PDF
GTID:2178360305478424Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In the domain of the oil exploration, remote sensing, the massive data file occur in the way of TB number of levels frequently. In the storage process, often due to lack of capacity of single storage devices which led to massive data files can not be stored, only by adding new storage devices to solve this problem. Currently, many of the technologies provide the function to deal the multiple disk array for a virtual disk to meet the level of TB data storage. But this can not avoid the problem of the "edge" data storage in multiple storage systems, namely, the remaining capacity of a disk system can store only a part of seismic data.Second, the different storage methods of a variety of storage devices (such as tape drives), has resulted in storing massive data can not be unified and efficient, but rather through the form of transcription, reducing storage efficiency of the storage devices, a great impact on the enterprise efficiency. Cluster environment, the massive data storage also depends on efficient task scheduling between the nodes, the more balanced use of resources, the operating response time shorter. Therefore, a suitable task scheduling algorithm between nodes play a very important role for the shorter the average response time and improve the efficient use of node resources and thus improve the storage performance of massive data.For these reasons, in the relevant areas,need a model mechanism to manage for the massive data in the clustered environment.In this model system, various media storage devices will be a unified storage,massive data will be stored across-disk and across-medium, and use of efficient task scheduling algorithm to reduce the operating average response time and increase storage efficiency.Put forward the corresponding across-disk storage method and the scheduling algorithm testing program, and implement a prototype.The main research contents of this article are as follows:(1)Though the research to the storage applications of the specific massive data in the oil companies, analysis the storage features of the tape drive and many other media storage devices, use of pipeline technology, process mechanisms and the calls of the underlying 10 system, shield the heterogeneity of the storage devices and put forward two sets of uniform memory access interface of the storage devices, the eventual realization of a multi-media storage devices unified storage, and for comparison the two sets of solutions in data security and buffer size.(2)Learn about the problem of the massive data files can not to store which due to inadequate capacity of the single storage devices encountered in the storage massive data in the oil companies, analysis of research actuality of massive data across-disk storage, using the underlying file IO memory access interface, put forward the storage and access mechanisms of the massive data across-disk, including a set of bottom file across-disk reading and writing interface, and the corresponding configuration across-disk operating system prototyping, achieved massive data across-disk storage, and test the corresponding memory interface.(3)Shorten the average response time of operations play an important role to improve the efficiency of the enterprises, this paper analyzes the basic dynamic load balancing algorithms, be used in the combination method of the Weighted Round-Robin, and put forward a load-balancing scheduling algorithm, by comparing the performance of the experimental algorithm, this algorithm has a short response time, the time of load balancing is few, low overhead and so on, and ultimately improve the wording efficiency.
Keywords/Search Tags:Mass storage, File operation, Pipeline, System call, Scheduling, Load Balancing
PDF Full Text Request
Related items