Font Size: a A A

Research On Indexing And Query Processing In Cloud Computing Systems

Posted on:2014-05-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:J B WangFull Text:PDF
GTID:1268330392972594Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Cloud computing is a recently developping computational framework. Clients ac-cess cloud resources and submit computing tasks through web browsers, with nothingtaken into accounts on how to build computing hardwares. As more and more dataset-s and services are delivered into cloud, building efcient data management systems incloud becomes an essential task for research community. Cloud systems provide scala-bility, which supports large scale data analysis and highly cocurrent transactions. As thatof existing data management systems, query processing is significant since it has large ef-fects on service level agreements. Query processing is also important for various servicesprovided in cloud, including IaaS(Infrastructure as a Service) and so on. Indexes are wellstudied for reducing CPU time, disk access operations for data management systems toimprove query performance. It is also expected to play the same role in cloud systems.As illustrated above, query processing and indexes are essential topics in cloud systems,however, existing works focus on key-value data in MapReduce framework, while otherare still in lack of attention.This thesis focuses on indexes and query processing in cloud systems, and designsindexes together with query processing algorithms based on data management techniques,computational theory and algorithmic technology. A wide range of data types and querytypes are considered in this thesis, and the following achievements are obtained.First, we propose a multi-dimensional index in cloud systems. Existing works main-ly focus on indexes in a single server or server-client structure. Unfortunately, such worksfail to provide efciency since performance bottleneck is introduced. This paper designsa two-layered index structure to prune search space among computing nodes for queryprocessing. The index reduces the number of involved computing nodes while query pro-cessing, and improves I/O efciency inside a single server. The initiation method andmaintenance methods are proposed for the index, together with optimization strategiesfor improving query throuphput. This thesis designs the point query algorithm, the rangequery algorithm and the kNN query algorithm for cloud systems, including distributed al-gorithms among computing nodes, and optimization strategies inside a single computingnode. Second, we target at string similarity search in cloud systems. Existing works fo-cus on query processing within a single server, and it incurs main memory overflow andexternal memory overflow while dealing with big data. For the above problems, we pro-poses a distributed index to support string similarity search in cloud environments. Toprovide efcient searching in a single node, an external memory index is designed, whichadopts multiple filtering techniques and optimizing strategies. The external memory res-ident index supports length filter, positional filter in disks. This paper proposes the indexconstruction method. During query processing, asymmetric q-gram is used to reduce thenumber of inverted lists read from disks. An adaptive algorithm is given to choose in-verted lists, and seek the tradeof between two aspects of query cost. The global indexpartitions the entire string dataset according the content of strings, and a char vector spacepartition method is proposed. In char vector space partition method, similar strings arepartitioned into the same computing nodes, thus the number of computing nodes involvedin a single query is reduced. The partition method is also adopted to determine necessarycomputing node set for a query to access. Simulation results validate the efciency andefectiveness of our proposed index.Third, we propose spatial approximate keyword query algorithms for cloud systems.Existing work targets on single server solutions, and an exact algorithm is given in mem-ory while another approximate algorithm is given for disk resident datasets. However, asingle server fails to provide reasonable throughput due to the limited CPU time and diskbandwidth. Facing the above challenges, this paper gives a two-layered index consistingof global index and local index, which works in a shared nothing cluster for larger querythroughput. This paper designs a novel external memory index as local index, which re-turns exact answer within disks efciently. It is equiped with keyword set signature andmultiple optimizing strategies to reduce I/O cost. The global index partitions the entirespatial space, and each computing node in system maintains a partition. A global indexselection algorithm is given. This paper also provides spaital approximate keyword queryalgorithms based edit distance, including range and the nearest neighbor spatial condi-tions. Experiments in a shared nothing cluster illustrates the efciency and efectivenessof our proposed index and query algorithms.Fourth, we propose multi-dimensional aggregation algorithms for cloud systems.Existing works focus on key-value data in MapReduce framework, and multi-dimensionaldata is not well considered. What is more, MapReduce framework launches a large num- ber of computing nodes for a single query and costs huge amounts of energy. For theabove problems, this paper designs a solution for answering aggregation queries in cloudthrough a two-layerd index, which reduces the number of computing nodes involved in asingle query and eliminates unnecessary computing time in a single server. The construc-tion and maintenance methods are given, together with a query framework. Followingthe given query framework, this paper proposes two aggregation algorithms under perfor-mance priori mode and low-power mode. Two sub-query scheduling problems are definedfor each of aggregation algorithms given above, and they are proved to be NP-Complete.Then, approximate algorithms with reasonable lower bounds are designed. Theoreticalanalysis and simulations validate the efciency and efectiveness of aggregation algo-rithms proposed in this paper.
Keywords/Search Tags:cloud computing, spatial database, query processing, aggregation
PDF Full Text Request
Related items