| Since its inception the Semantic Web has been used in a variety of fields such as life sciences,statistics,finance,open science and health.The Semantic Web uses the Resource Description Framework(RDF)as its data format to describe the information in the Web,where the RDF data model consists of a triadic schema,and SPARQL is the Semantic Web’s standard query language for querying RDF datasets.However,as the size of RDF datasets grows larger and the querying of the triadic schema becomes more complex,the querying efficiency of SPARQL gradually decreases,so it becomes a challenging problem to perform efficient queries on massive RDF datasets.Aiming at the problem that the current SPARQL query time is too long,this paper proposes a SPARQL query engine based on a heuristic algorithm,uses the SparkMMAS-LKH optimization algorithm to reorder the triple pattern,and calculates the cost matrix of the triple pattern by calculating the cost matrix of the triple pattern.size to find the best triple join order.In this thesis,the SPARQL query engine based on the Spark-MMAS-LKH algorithm is divided into five parts.The Spark-MMAS-LKH algorithm is located in the core part of the engine query optimization layer,which is responsible for reordering the triple pattern.In the query optimization layer,the optimization process is divided into two steps: the first is to construct the initial weight matrix(ie,the cost matrix)of the triple pattern,and obtain the initial weight by calculating the cardinality estimation value and estimated connection value of the triple pattern.Then the weight matrix is brought into the Spark-MMAS-LKH hybrid optimization algorithm as a parameter,the MMAS algorithm and the LKH algorithm are hybridized in the relay mode,and the RDD operator in the distributed framework Spark is used to speed up the MMAS-LKH The iterative speed of the algorithm,so as to complete the optimization of the weight matrix and find the optimal connection order of the triple pattern.In order to verify the influence of the SPARQL engine designed in this paper,the SPARQL query engine based on the Spark-MMAS-LKH algorithm is compared with other optimized engines and unoptimized original engines in the public dataset LUBM100.It can be seen from the comparison results that the SPARQL engine based on the Spark-MMAS-LKH optimization algorithm proposed in this paper has played a positive role in querying large-scale RDF datasets and achieved the expected results. |