Font Size: a A A

Query Processing In Distributed Information Networking Database Management System

Posted on:2018-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:J XuFull Text:PDF
GTID:2428330515989691Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the coming of Internet era,data is increasing at an alarming rate in the Internet.Now the global data traffic has arrived at ZB level.Big data era shows three basic features,they are volumes?variety?velocity,which put forward challenge to the current database.The traditional centralized storage method can't adapt to big data for its low reliability and poor scalability,on the contrary the distributed parallel system becomes the first choice.The most popular database is based on relation model,which can manage structured data effectively,however it can't show semi-structured and unstructured data.So in order to analyze those data,Nosql database develops quickly.At the same time,some new data model appears too.Information Networking Model(INM)is a new semantic database model,which can express the entity objects and their kinds of semantic relationships in the real world naturally based on the semanteme.Distributed parallel INM is the distributed extensional version,which inherits the advantages of INM and can storage massive data with the shared-nothing architecture.So how to query in massive semantic data effectively,the paper does further research about this question based on INM model in the distributed parallel environment.In the distributed cluster system,data is partitioned in different nodes according to data partition algorithm.So the single node can't execute query without the help of other nodes,which needs to get data from other nodes in the cluster.This step causes extra and expensive network communication expense especially for the complex multi-join query.Current research almost optimize it from two ways,which are data partition algorithm and optimized query plan.So considering these background,this paper proposes two different optimized query plan based on corresponding data partition algorithm.On account of the consistent hashing algorithm,data is stored in the cluster in load balance.By splitting query into some subqueries and making these subqueries parallel,query can improve efficiency.However,it will cause massive redundant data in the network transmission.Also the merge task in controller is also vast.So this paper proposes the Traffic Query Split Algorithm.The Traffic Query Split Algorithm is aimed at splitting a query into several PWOC(parallelizable without communication)subqueries,which guarantees every subquery parallels approximately without communication.In view of organization dynamic rebalancing partition algorithm,the data with tight correlation stored in the same node.In this background,a simple query strategy can make the efficiency better.So the paper proposes a lightweight and simple query plan,which based on Message delivery.This strategy solves the communication problem by generating new query and message when needs to get data from other node.In the end,the system generates test data by DBpedia and conducts a comparative experiment between the previous and two optimal algorithms.The result proves that the optimal algorithm can improves the efficiency of query,especially for complex query.What's more,the system also test the subquery balance by experiment,which proves that The Traffic Query Split Algorithm can split the query in balance.
Keywords/Search Tags:query optimization, distributed parallel processing, Information Network Model, data partition, jump object
PDF Full Text Request
Related items