Font Size: a A A

The Design And Implementation Of A Distributed Non-trasaction Column-oriented Storage Engine

Posted on:2016-03-17Degree:MasterType:Thesis
Country:ChinaCandidate:M R GanFull Text:PDF
GTID:2308330473456002Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
In recent years, with the development of network technology, the way of access the Internet has been improved with the time, which causes more and more user data online. In the face of this blow out data, most data analyse systems are still disk based mode, although many company have made great improvement in distributing the system, but this kind of architecture remain huge problems of our future analyzation, speed of query and import is slow, the low utilization rate of resources etc..Aiming at these problems, this thesis looks up to many researches and combines the development trend of the industry, analyzes the existing data warehouse system’s architecture with their features, and finally designs and implements the memory based storage engine(Memory Database Engine, MDE) with our own specific needs. This system can deploy a high speed platform of mass data processing for the enterprise, to provide a real time response, high reliability, high scalability’s distributed data warehouse system, all for a better supporting to the upper layer’s analysis method. So that enterprises can quickly get useful data from complex redundant data, and deal with the problem from the most changeful market as soon as possible. The main innovation work are presented as the following:1. Design the system architecture of memory data storage engine. Data storage in memory is based on a column-oriented mode, according to the theory that column store can compress data more efficient and speed up the query. At the same time, system uses hot standby to ensure the cluster’s high reliability.2. Design the data structure for the resultful compression and fast warehouse increment, improve the efficiency of calculation and save the memory resouces at the same time.3. Use epoll asynchronous event driven model as our network I/O model, with the pure memory operation, performace may increase by orders of magnitude. Use thread pool model to deal with asynchronous operation on disk which increases the concurrent processing ability of the system.4. Systems provides interface of most database physical execution, offer the result directly to the upper level after processing, reduce network transmission and resource waste of the scheduling node.5. Dynamic allocation of tasks, in the control node will know the real-time status of each node, when a new request coming, control node may assign the storage or compute job to an appropriate node according to load status. At the same time, system balances automatically, heavy load nodes in certain conditions will transfer a portion of the data to another node on light load.Through the functional and pressure testing, the results show that the system can effcetively store and query the massive data, can greatly shorten the response time, can achieve load balancing of the system which meets the main needs of our design.
Keywords/Search Tags:distributed system, load balance, in memory warehouse, column-oriented storage
PDF Full Text Request
Related items