Font Size: a A A

Design And Implementation Of The Distributed File System For The Column-oriented Database

Posted on:2018-12-27Degree:MasterType:Thesis
Country:ChinaCandidate:Z S LiFull Text:PDF
GTID:2348330536477917Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Along with the advent of area of the big data,tons of data bring a serious challenge to the traditional data analysis system.On the one hand,data generated from the Internet,such as online shopping,social networking sites,hidden a wealth of social information,which can be generated after data mining can bring considerable social,economic and scientific research value.On the other hand,the massive increase of the data increase the difficulty of the data storage and computing.The response,reliability and stability of the data have been upgraded to a new level.The limitation of traditional file storage system and traditional relational database system is becoming more and more obvious.But distributed storage system and database technology can solve this problem.Based on the present situation that the distributed storage system can not meet the storage requirements of the column-oriented database,this paper designs a distributed storage system for the column-oriented database.According to the characteristics of high performance oriented query and data-intensive in the storage of the column-oriented database,this paper built a number of Monitor Node and Operation Node for replacement of some functions of centralized management node.What's more,proposed a peer to peer architecture for load balancing strategy,for the replacement of the centralized load management function.In General,this optimization of the current mainstream distributed file system architecture.Based on the architecture of this distributed file system,this paper designed a peer-to-peer architecture of shared cache,bring the cache supportable to the distributed storage system.This paper also designed a configurable compression framework for distributed storage system,which is suitable for the compression of column-oriented database data.Based on the compression framework,we can dynamically add and configure a compression algorithm for different types of data,and provide a transparent compression support for specific types of data.At the same time,this paper realizes the distributed storage system and the peer-to-peer shared cache architecture.The experimental results show that the distributed storage system designed in this paper has a good performance in reading and writing,under 100 Mbps network environment,1MB data written speed is 16.81 times of the HDFS;Average readingspeed increased by 25.34%,write speed increased by 18.25%;column database file batch upload speed increased by 19.8% compared to HDFS.In addition,the cache architecture and the compression framework are both usable.The test results meet the design expectations.Overall,the distributed storage system designed and implemented in this paper has a good performance.Compared with the mainstream distributed storage system,it is innovation and experiment.And it has the adaptability and pertinence to the column-oriented database,which provides a good storage support for the distributed computing of the column database.
Keywords/Search Tags:distributed storage system, distributed file system, column-oriented database
PDF Full Text Request
Related items