Research On Data Sharding Problem Based On Relational Database

Posted on:2017-03-06

Degree:Master

Type:Thesis

Country:China

Candidate:X L Dong

Full Text:PDF

GTID:2308330485982067

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the continuous development of the china economy,human life level has been made progress continuously and network technology has gained popularity very rapidly.By the end of 2015,the number of Internet users in China has reached 688 million.Coupled with the network equipment and transmission media hardware upgrading,network speed greatly improved,consequently resulting in the mass application dataset.Some large Internet corporations produce tens of TB data and accept billions of access every day. This puts forward higher request to the performance of traditional relational database,but the storage capacity is limited and the expansion ability is week.So the traditional relational database has not been able to meet the needs of large scale data storage and mass visit.Distributed database system is the ideal solution to solve massive data access and storage problems,through the combination of many stand-alone databases into a unified whole, realizing distributed storage and access. Since it was put forward in the 1970s,the distributed database has been developed for many years, and has been the birth of many excellent products. The main problem that the application data from single database to the distributed database system is the data sharding problem, namely what algorithm is used to split the data into different physical nodes.This paper focuses on the following two aspects of data sharding:Firstly,combined with the existing data sharding algorithms, a new algorithm is proposed, which is based on the combination of range sharding and hash sharding.Overall, data being splited by the incremental interval, data sharding is not distributed in a data node, but a group of nodes.Locally, in the node group, the sharding will be evenly distributed to each data node in the group.This algorithm inherits the advantages of range sharding and hash sharding, which has the advantages of uniform data distribution and strong expansion ability. It also avoids their disadvantages, and it is a kind of excellent sharding algorithm.In contrast with the consistent hash algorithm, the proposed algorithm has a good ability of data access and expansion.In particular, it can be very convenient for data expansion, no need to migrate any data.In the practical application, the algorithm solves the problem of large scale data query delay and has good application value.Secondly,do some work on the application of data sharding.To put the data sharding into application, we must solve two basic problems:the uniqueness problem of the self increasing key and the problem of distributed join.The two problem is the basic problem of database from a single point to the distributed cluster which have important influence on the availability and performance of distributed systems.In this paper, based on the results of previous studies, discussed and analyzed various scenarios, I give the solutions respectively and carried out the implementation of the code.For auto increment primary key nonuniqueness problem, I give a global sequence generation solution by maintaining a global sequence table to generate auto increment primary key.For the distributed join problem, due to its huge cost, first consider the distributed join to avoid.Data redundancy and horizontal derivative segmentation are two effective ways to avoid the operation of distributed join.If these two methods are not suitable, then consider the Direc-join algorithm.The performance of the global sequence generation and direct join are tested, and the test results have reached the expected, to a certain extent, which can solve the problems.

Keywords/Search Tags:

Dstributed Database, Data Sharding, Extensibility, Distributed Join

PDF Full Text Request

Related items

1	Research On Join Query Optimization Algorithm In Distributed Database
2	Distributed Database Multi-join Query Optimization Algorithm
3	The Design And Implementation Of Distributed Data Management System For Large-Scale Virtual Screening
4	Research On Data Query Optimization In Distributed Database
5	A Study Of Multi-join Query Optimization Algorithm In Distributed Database
6	Design And Research Of SAAS-Based Database Archi Tecture
7	Research On Query Optimizing In Distributed Database
8	Query Optimization Technique For Distributed Database System Based On Semi-Join Algorithm
9	Research On Multi-Join Query Optimization Of Distributed Database Based On Genetic Algorithm
10	Auto-sharding Technique And Algorithm For Distributed Relation Database Based On SQL History