Font Size: a A A

Research On Data Sharding Problem Based On Relational Database

Posted on:2017-03-06Degree:MasterType:Thesis
Country:ChinaCandidate:X L DongFull Text:PDF
GTID:2308330485982067Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the continuous development of the china economy,human life level has been made progress continuously and network technology has gained popularity very rapidly.By the end of 2015,the number of Internet users in China has reached 688 million.Coupled with the network equipment and transmission media hardware upgrading,network speed greatly improved,consequently resulting in the mass application dataset.Some large Internet corporations produce tens of TB data and accept billions of access every day. This puts forward higher request to the performance of traditional relational database,but the storage capacity is limited and the expansion ability is week.So the traditional relational database has not been able to meet the needs of large scale data storage and mass visit.Distributed database system is the ideal solution to solve massive data access and storage problems,through the combination of many stand-alone databases into a unified whole, realizing distributed storage and access. Since it was put forward in the 1970s,the distributed database has been developed for many years, and has been the birth of many excellent products. The main problem that the application data from single database to the distributed database system is the data sharding problem, namely what algorithm is used to split the data into different physical nodes.This paper focuses on the following two aspects of data sharding:Firstly,combined with the existing data sharding algorithms, a new algorithm is proposed, which is based on the combination of range sharding and hash sharding.Overall, data being splited by the incremental interval, data sharding is not distributed in a data node, but a group of nodes.Locally, in the node group, the sharding will be evenly distributed to each data node in the group.This algorithm inherits the advantages of range sharding and hash sharding, which has the advantages of uniform data distribution and strong expansion ability. It also avoids their disadvantages, and it is a kind of excellent sharding algorithm.In contrast with the consistent hash algorithm, the proposed algorithm has a good ability of data access and expansion.In particular, it can be very convenient for data expansion, no need to migrate any data.In the practical application, the algorithm solves the problem of large scale data query delay and has good application value.Secondly,do some work on the application of data sharding.To put the data sharding into application, we must solve two basic problems:the uniqueness problem of the self increasing key and the problem of distributed join.The two problem is the basic problem of database from a single point to the distributed cluster which have important influence on the availability and performance of distributed systems.In this paper, based on the results of previous studies, discussed and analyzed various scenarios, I give the solutions respectively and carried out the implementation of the code.For auto increment primary key nonuniqueness problem, I give a global sequence generation solution by maintaining a global sequence table to generate auto increment primary key.For the distributed join problem, due to its huge cost, first consider the distributed join to avoid.Data redundancy and horizontal derivative segmentation are two effective ways to avoid the operation of distributed join.If these two methods are not suitable, then consider the Direc-join algorithm.The performance of the global sequence generation and direct join are tested, and the test results have reached the expected, to a certain extent, which can solve the problems.
Keywords/Search Tags:Dstributed Database, Data Sharding, Extensibility, Distributed Join
PDF Full Text Request
Related items