Font Size: a A A

Research On Theory And Methods Of Data Placement Optimization In Distributed Storage

Posted on:2016-12-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:T WangFull Text:PDF
GTID:1368330482959127Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Compute, transmission and storage are the three cornerstones of information infrastructure. With the advance of computer and internet technologies, the core of an information system is no longer compute but data, which has become an important foundational resource rather than a processing object. In the age of big data, the demands for management of the huge amount of distributed and shared data resources become more and more important for various data-intensive applications. The emergence of distributed data storage brings opportunity for distributed data management and also faces potential challenges, especially the data placement which has a significant influence on the access performance. Hence, studies on data placement optimization in distributed data storage have great theoretical value and practical significance in improving the access performance and availabity of a system.Based on the current research progress on distributed data placement, this dissertation studies the optimization theory and methods of data placement for improving the access performance and availabity of a distributed storage system, speeding up the response time of data-intensive applications and providing fast and reliable data services for users. The focal points of this dissertation mainly includes:(1) A systematic theoretical model of data placement optimization for distributed data storage. Through the demand analysis on data placement, the key factors that affect the access performance and the correlationship among data objects, compute tasks and data centers are studied. By exploring the connection between the data placement optimization and access delay, a systematic theoretical model of data placement optimazition for distributed data storage is proposed which can provide adequate theoretical basis for the optimization algorithms of data placement.(2) A multi-objective optimization algorithm for data placement.Through the analysis on the correlation between data objects at application level, a multi-objective optimization algorithm for data placement is proposed based on the theoretical model. The algorithm provides a real-time online data placement optimization method which can obtain a rational optimal solution in deterministic polynomial time, and realize the ideal trade-off between load balancing and minimizing the data scheduling.(3) A dynamic data placement optimization strategy, including a robust replication algorithm, a self-adaptive online data migration algorithm and an effective small file access algorithm.Through the analysis on influence of data replica, data migration and small files on access delay, A dynamic data placement optimization strategy is proposed based on the theoretical model, which includes a robust replication algorithm, a self-adaptive online data migration algorithm and an effective small file access algorithm. The replication algorithm creats and manages replica dynamically, accurately and effectively; the data migration algorithm automatically migrates data according to the system status and users access; small file access algorithm reduces the number of I/O for massive small files and improves the I/O delay to adapt to the data processing mode of data-intensive applications. These algorithms provide a high-intelligent and high-performance data placement optimization for distributed data storage.This dissertation studies how to improve the access performance and reduce the access delay of storage systems from the point of data placement optimization. The constructed theoretical model of data placement can provide theoretical references for the design and realization of data placement optimization algorithm. The proposed data placement algorithm, data replication algorithm, data migration algorithm and small files algorithm can intelligently, effectively and reliably improve the system performance, from macro to micro, from whole to part, and meet the requirements of data management in distributed data storage.
Keywords/Search Tags:Distributed data storage, Data-intensive applications, Data placement, Replication, Data migration, Small files
PDF Full Text Request
Related items