Research On Data Model And Index Technology In The Cloud

Posted on:2014-02-11

Degree:Master

Type:Thesis

Country:China

Candidate:C J Sun

Full Text:PDF

GTID:2248330395484135

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of computer and Internet technologies, the amount of data hasexpanded rapidly. Traditional data model and index technology have been unable to satisfy therequirements of massive data management, which becomes a major challenge for traditional datamanagement. As a new computing platform, cloud computing has attracted wide attention fromacademia and the business community. And it has become an important field to research data modeland index technology based on the characteristics and requirements of cloud computingenvironment. The main contributions in this thesis are described as follows:(1) The basic concept, characteristics and development of cloud computing are summarized inthis thesis, and then the existing data model and index related technologies in cloud environment aresummarized and analyzed.(2) Typical key-value data model in cloud environment cannot effectively support the user’svarious queries, such as range query and non-primary key query, therefore, a new data model:Key-MultiValue is proposed in this thesis. Key-MultiValue can support non-primary key querythrough partitioning the value and changing the attributes partitioned dynamicly according to thequery hotspots. On the other hand, P-Ring structure is adopted to partition the data, which caneffectively support range query. Moreover, node performance state parameter is introduced in P-ringto solve the shortage that it does not take into the difference in performance of the each storagenode itself. Finally, the experiment and result analysis show that the data model can effectivelysupport range query and non-primary key query, and it also improves the success rate of query andquery throughput.(3) Cloud computing systems cannot effectively support similarity search due to lack ofefficient index structures, and with the increase of dimensionality, the existing tree-like indexstructures could lead to the problem of “the curse of dimensionality”. In this thesis, a novelVF-CAN indexing scheme is proposed. VF-CAN integrates CAN based routing protocol and theimproved VA-File index. There are two index levels in this scheme: global index and local index.The local index VAK-File is built for the data in each storage node. VAK-File is the k-meansclustering result of VA-File approximation vectors according to their degree of proximity. In theglobal index, storage nodes are organized into an overlay network CAN, and in order to reduce thecost of calculation, only clustering information of local index is published to the entire overlay network through the CAN interface. The experimental results show that VF-CAN reduces the indexstorage space and effectively improves query performance.

Keywords/Search Tags:

Cloud Computing, Data Model, Key-value, Index Structure, K-means Clustering

PDF Full Text Request

Related items

1	Research On Cloud Computing Search Engine Design And Parallelization K-means Clustering Algorithms For Big Data
2	The Key Research Of Clustering Algorithm Parallelization On The Platform Of Cloud Computing
3	Research On K-Means Clustering Algorithm Based On Hadoop Cloud Computing Platform
4	Research On Multidimensional Data Index In Cloud Computing System
5	Clustering Algorithm Based On The Background Of Big Data
6	Research Of Cluster Scheduling Algorithm In Cloud Computing Based On Logistics Data
7	Research On Simplified Algorithm Of 3D Point Cloud Data Based On Grey Wolf Optimized K-means Clustering Algorithm
8	Research Of Clustering Mining Algorithm Oriented Big Data
9	Improved K-means Algorithm And Its Application In The Cloud Task Allocation Strategy
10	Reseach On Data Placement Strategy For Data-intensive Applications In Cloud