Font Size: a A A

Improved Algorithm And Performance Optimization Of Distributed Storage System Based On NoSQL

Posted on:2017-01-09Degree:MasterType:Thesis
Country:ChinaCandidate:K WangFull Text:PDF
GTID:2278330485462759Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of the network storage system, based on the application of distributed storage is experiencing an unprecedented rapid development,the need to store a large number of data sites, cloud services, etc.. However, these data are currently deployed in a single node storage devices, with the expansion of the scale of data, a single host of resources and can not accommodate large-scale data. As the subsequent expansion of the cost is expensive, it is urgent need to introduce a distributed storage system to solve the problem of large data storage and access. At the same time, with the development of electronic commerce and the wide application of Web2.0 technology in the network application, the traditional relational database can not meet the requirements of today’s data storage. NoSQL database is a supplement to the relational database, through the simple data model, metadata and the application of data separation and weak consistency and other technologies, to achieve the effective management of large data.Based on the above, this research is mainly based on the NoSQL distributed storage of data distribution, data compression, and storage format and, on the analysis and summary of relevant research at home and abroad based on, redis improved consistent hash algorithm and based on the performance of the hive Optimization Research Based on and to redis application in the list as the background, effective performance analysis and evaluation of redis is proposed. The main research work is as follows:(1) based on Redis improved consistency hash algorithm, in order to solve the problem of data equalization in distributed storage system, it can improve the reliability and availability of the algorithm in the application of practice. By the redis storage nodes are logically divided into a group, the group in the master-slave mode can improve the consistency and reliability of the distributed storage, and analyzed with a group in different read write strategy data consistency. When the group master node downtime, using data from backup node and switch can provide timely service.Experiments show that the algorithm can effectively reduce the average response time and improve the system throughput, so that the distributed storage system load is more balanced.(2) based on the optimization of the performance of the hive, in order to solve the distributed storage system file system data compression and storage format by tooptimize the performance of MapReduce job scheduling and hive performance tuning two aspects of hive. The MapReduce programming model mainly from starting,analysis of the implementation process, and the parameter tuning from the map end and reduce end. Then from the perspective of the hive framework of research are done from the aspects of the partition table and the external surface and common data file compression, the line storage and column type storage. Experimental results show that snappy compression, orcfile/parquet storage format can improve query efficiency for the column type query.
Keywords/Search Tags:NoSQL, Distributed Storage System, Consistent Hashing, Data Compression, Storage Format
PDF Full Text Request
Related items