Font Size: a A A

Design And Implementation Of A Distributed Hybrid Index Structure Based On Spark

Posted on:2021-01-09Degree:MasterType:Thesis
Country:ChinaCandidate:S ZhanFull Text:PDF
GTID:2428330611498039Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,the users have generated massive amounts of data.The storage and retrieval technology of these data becomes more and more important.Although the traditional single-machine storage method is very simple to process data,the storage cost is high,and the storage space is limited.There is an urgent need for a distributed solution to provide convenience for data processing.When dealing with complex and huge data,the use of appropriate indexes can effectively speed up the speed of data retrieval.Therefore,a distributed index method needs to be designed to provide services for big data processing.Therefore,this paper proposes a distributed hybrid index structure to provide a solution for improving the performance of big data retrieval.This distributed index structure is guided by distributed theory and uses the more popular Spark framework as the basis for distributed computing to complete system functions.With its distributed parallel computing capabilities,the Spark framework distributes big data to several nodes for parallel computing,and at the same time combines the characteristics of RDD memory computing,greatly speeding up the calculation.According to this feature,the index structure is reasonably designed,and the way of providing services is used to allow the main control module to provide an interface for index operations to external applications.Users can call these interfaces for index establishment and data query operations.At the same time,the web display function is added to the system,and complex system interfaces are called,and a web display front-end and background index data management system is constructed to support the management of users and index data.At the same time,you can also draw a heat map for data heat analysis,draw a data relationship tree diagram to explore the relationship between data,so that the distribution of data is easier to observe.After testing,it was found that the system can provide users with a distributed index establishment service more conveniently,and complete index data retrieval,and can provide index functions for large data warehouses.At the same time,the system is more robust and extensible,and can be developed and expanded on this basis.
Keywords/Search Tags:distributed system, index, big data, parallel computing, Spark
PDF Full Text Request
Related items