Design And Implementation Of A Distributed Hybrid Index Structure Based On Spark

Posted on:2021-01-09

Degree:Master

Type:Thesis

Country:China

Candidate:S Zhan

Full Text:PDF

GTID:2428330611498039

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of the Internet,the users have generated massive amounts of data.The storage and retrieval technology of these data becomes more and more important.Although the traditional single-machine storage method is very simple to process data,the storage cost is high,and the storage space is limited.There is an urgent need for a distributed solution to provide convenience for data processing.When dealing with complex and huge data,the use of appropriate indexes can effectively speed up the speed of data retrieval.Therefore,a distributed index method needs to be designed to provide services for big data processing.Therefore,this paper proposes a distributed hybrid index structure to provide a solution for improving the performance of big data retrieval.This distributed index structure is guided by distributed theory and uses the more popular Spark framework as the basis for distributed computing to complete system functions.With its distributed parallel computing capabilities,the Spark framework distributes big data to several nodes for parallel computing,and at the same time combines the characteristics of RDD memory computing,greatly speeding up the calculation.According to this feature,the index structure is reasonably designed,and the way of providing services is used to allow the main control module to provide an interface for index operations to external applications.Users can call these interfaces for index establishment and data query operations.At the same time,the web display function is added to the system,and complex system interfaces are called,and a web display front-end and background index data management system is constructed to support the management of users and index data.At the same time,you can also draw a heat map for data heat analysis,draw a data relationship tree diagram to explore the relationship between data,so that the distribution of data is easier to observe.After testing,it was found that the system can provide users with a distributed index establishment service more conveniently,and complete index data retrieval,and can provide index functions for large data warehouses.At the same time,the system is more robust and extensible,and can be developed and expanded on this basis.

Keywords/Search Tags:

distributed system, index, big data, parallel computing, Spark

PDF Full Text Request

Related items

1	Design And Implementation Of Voltage Index Management System Based On Spark Platform
2	Research On Memory Data Management Technology In Spark
3	A System For Distributed MD Data Analysis Based On Spark
4	Study Of Distributed And Parallel Index
5	Research On Apache Spark Distributed Parallel Computing Framework Optimization Technology
6	Parallel Research On Data Mining Algorithm Based On YARN And Spark Framework
7	Research On Optimization Of Map Reduce For Interactive Analysis On Big Data
8	Research Of Query Processing Technology For Geospatial Big Data Based On Spark
9	Research On Parallel FM-Index Algorithm Based On Spark In RNA-Seq Reads Mapping
10	Research And Implementation Of Memory Optimization Based On Parallel Computing Engine Spark