Font Size: a A A

Research On Data Partition Optimization Method Of Shared-Nothing Relational In-Memory Database

Posted on:2020-10-10Degree:MasterType:Thesis
Country:ChinaCandidate:X M HuangFull Text:PDF
GTID:2428330620460074Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The requirements of both database scalability and ACID properties in modern online transaction processing(OLTP)applications have spawned a shared-nothing Distributed In-memory DBMS.Such DBMSs use a shared-nothing architecture to distribute data across multiple partitions,and may access data from multiple partitions when processing a transaction,generate distributed transactions.Distributed transactions have a huge impact on the performance of such DBMSs due to the expensive network communications for cross-partition coordination.In order to optimize such DBMS performance from the perspective of reducing distributed transactions,many partition-based solutions have been proposed.The most of these solutions model and analyze the co-accessed relationship of data in such DBMSs,and divide the data tuples into partitions to generate a new data partitioning plan.By placed most data tuples that have common co-accessed relationships in the same partition,this plan could reduce the number of distributed transactions to improve the performance of such DBMS.In addition to reducing the number of distributed transactions,to improve performance of such DBMS,there are more requirements to these partition-based solutions and their partition plans.These requirements include requirements for time consumption of the solution,load balancing of partitioning plan,and requirements brought by the community structure in co-accessed relationship.Commgraph is one of the partition-based solution that currently provides a good improvement for such DBMS performance,uses a fine-grained graph modeling approach,performs community detection on the graphs and then places the discovered communities into the partitions.In this way,Commgraph could generate a partition plan which reduces a large number of distributed transactions,and satisfies the requirements of the load balancing of the partitioning plan and the requirements brought by the community structure.However,we further discover that fine-grained modeling in Commgraph will cost a lot of time to generate the partition plan when the amount of data in the DBMS is large,and can not meet the requirements of the solution time consumption.In addition,this paper proposes that there is a problem of “small community partition occupation”on the community placement module of Commgraph.Since the original Commgraph is placed in the community,it only considers whether the community can be placed in the current partition,regardless of whether the previous partition can also install the community,resulting in some smaller partitions not being properly allocated to the partition,and occupying additional partitions,leading extra partition used in the partition plan.This paper focuses on the problem of excessive time consumption of the Commgraph solution and the small community partition occupancy problem caused by the placement of the module,and have two improvements based on the Commgraph.The first is the improvement of the community placement module of Commgraph,reduce the number of partitions used in the partition plan.The second is to propose a graph compression tool called ACTDP,which compresses the graph of co-accessed relationship by vertex aggregation,and provides Commgraph with the compressed graph to greatly reduce its time consumption,enabling Commgraph to quickly complete the partition plan,making Commgraph to meet the requirements of the solution time consumption.Finally,this paper has detailed evaluation to the effects of the two improvements.In the evaluation of the community placement module,this paper proves that the improvement of the Commgraph's community placement module can make the Commgraph partitioning plan use the same or less number of partitions than the original Commgraph community placement module without reducing the Commgraph optimization effect on the DBMS.In the evaluation of the ACTDP,this paper proves that the time consumption of the Commgraph generation partition plan after using ACTDP can be greatly reduced,and the more the data volume,the more obvious the decline.In the database with 1 million tuples,the time consumption is about80% lower than the original Commgraph,and the throughput is only reduced by about 15%when using the partition plan to optimize the DBMS.Therefore,the two improvements proposed by Commgraph in this paper can solve the problem of Commgraph's small community partition occupation and large time consumption problem.It is of great significance to optimize the partition when shared-nothing distributed relational DBMS has a large amount of data.
Keywords/Search Tags:OLTP, DBMS, Shared-nothing, Distributed, Partition, Community Structure
PDF Full Text Request
Related items