Font Size: a A A

Research On Distributed Query Optimization And Implementation Of Data Governance Platform

Posted on:2021-04-14Degree:MasterType:Thesis
Country:ChinaCandidate:L WuFull Text:PDF
GTID:2428330620468773Subject:Engineering
Abstract/Summary:PDF Full Text Request
In the 21 st century,with the development of technology,various computers and communication technologies are changing with each passing day,and the exchange of information between people is more dependent on various electronic terminal devices,A large amount of data exchange has become a distinctive symbol of this era.In order to be more convenient for people's lives and work,big data distributed systems emerged quietly in this era.Research in this direction has also become popular in the field of scientific research and application.This article is based on the current demand for big data related processing,implemented a data governance platform based on big data distributed processing system,Respond to user operations through the front end,and then use the relevant components of the big data distributed system to store data and perform a series of processing according to specific business needs,so as to achieve the acquisition and application of large amounts of data on different platforms.In the big data processing technology,the query algorithm based on the distributed database plays an important role in the real-time response and processing efficiency of the system,so this article will conduct a detailed study on the algorithm.In this paper,not only the distributed system and its data partition and allocation strategy are described,but also introduce the traditional query algorithm based on half connection and the traditional direct connection query algorithm partition algorithm in detail,in view of these two kinds of algorithms which are suitable for different occasions,the existing problems are analyzed and improved respectively.For semi join query algorithm,in order to make up for the slow speed of communication network in cluster,making a new idea to query the data fragmentation transmission strategy of each station involved,also the idea of using projection data set without de duplication to join multiple tables is put forward,making full use of the parallel characteristics of distributed cluster to reduce the number of relational tuples involved in connection operation and reduce the cost of network transmission,Combining with the concept of selection factor,this paper makes a theoretical analysis of assumption,finally,an experiment is designed to simulate the distributed cluster communication to verify the validity of the idea.For direct connection based query algorithm,this paper introduces a new partition strategy based on the original advantages of the algorithm,make the query not only segment attributes of one relationship operation,this can further reduce the relationship redundancy of local queries.Finally,the original partition algorithm and the improved algorithm are used to test the query operation with multiple connections,which also verifies the effectiveness of the new algorithm.A data management platform based on data storage and processing is implemented,support the data collection of some structured data(two-dimensional tables of multiple relational data platforms)and unstructured data(txt,word and other texts)on different platforms,it ensures the security of communication and can check,modify,and provide unified access and query functions to the data introduced into the big data platform according to specific user needs.This article follows the current mainstream enterprise development framework and builds a website architecture that meets current business needs.
Keywords/Search Tags:Distributed database, query optimization, semi-connected query, direct connection query, data fusion analysis platform
PDF Full Text Request
Related items