Research On The Method Of File Search And Management Based On Provenance Relationships In Cloud Storage Systems

Posted on:2017-08-07

Degree:Doctor

Type:Dissertation

Country:China

Candidate:J J Liu

Full Text:PDF

GTID:1318330485950835

Subject:Computer system architecture

Abstract/Summary:

With the wide use of the network services, such as, cloud backup, cloud storage and cloud video sharing, the scale of cloud storage systems is larger and larger, which brings sig-nificant challenges to the performance of file access. Users need to confirm the files which they need and the locations of the files before they access the files. Hence, it is necessary to improve the performance of file access. However, the current methods of metadata search cannot provide the high-performance of file search, and we need to mine the more relation-ships among files to improve the performance of metadata search. Most data in cloud storage systems are cold data and stored on low-cost low-performance devices. Moreover, there are some applications which need the high-performance access for the cold data. Hence, we need to improve the performance of file management and build a high-performance struc-ture of file distribution and metadata index in cold data storage. At the moment, videos accounts for 65 percentage of the data and the network traffic over the Internet, the large number of near-duplicate videos within the videos bring the massive network and storage overheads. We need to improve the space efficiency of storage and the speed of video access via mining the relationships among the videos for the application and management of this kind of specific data. Hence, it is important for improving the performance of file access to mine the relevance among files.In cloud storage systems, there exist data correlations among the files which have provenance relationships. The data correlation means that the files have same or similar content. Hence, the correlations among files mined from the provenance relationships in-clude the strong content correlation, attribution correlation, the correlation of read/write character and weak content correlation. At same time, since the provenance data of files record all processes and files affecting the final state of the files, we can get more file rela-tionships from space dimension and the changes of file relationships from time dimension via analyzing the provenance data, and thus improve the accuracy of the file relevance mea-surement. Hence, in order to improve the efficiency of file access, we propose three kinds of optimization methods by using the features of file relevance.In order to address the problem that the scale expansion of cloud storage systems re-duces the performance of file metadata search, we propose a metadata search method based on the correlation mined from the files having provenance relationships, named PROMES, which is presented by adding the relational graph search to reduces the search ranges and thus speeds up the search in metadata index tree and employing the relationship’s time factor and the?les’ weight to improve the accuracy of file correlation measurement in relational graph. The metadata search in PROMES is split into three phases:ⅰ) leveraging correlation-aware metadata index tree to identify several files as seeds, most of which can satisfy the query requests, ⅱ) using the seeds to find the remaining query results via relationship graph search, ⅲ) refining and reranking the whole search results. PROMES has the salient fea-tures of high query accuracy and low latency, due to files’ tight and lightweight indexing in relationship graph coming from provenance’s analysis. Experimental results show that PROMES can reduce 1-2 orders of magnitude query delay compared with the traditional index tree, and enhance the accuracy ratio of metadata search.Cloud storage service providers usually store the cold data files and metadata to the low-cost and low-performance devices which causes the limitation in the efficiency of file access. To address the problem, a data distribution and metadata index scheme for cold data, which is based on the similarity of file attribution and data access come from the prove-nance relationships of files, was presented by mining the similarity of file access among the files having provenance relationships to adjust the distribution of cold data for reduc-ing the access time and saving energy, and by mining the similarity of metadata among the files having provenance relationships to group the metadata of files for reducing the time of metadata search. The mechanism includes two schemes, i.e., a data distribution scheme based on the similarity of file access characteristic coming from provenance relationship, named Prodi, and an index scheme for cold data based on the similarity of metadata coming from provenance relationship, named P-index. Experimental results show that Prodi saves about 25% energy, and the query time of P-index is one order to two orders faster than the current metadata index schemes.In order to address the problems that the large number of near-duplicate videos de- creases the quality of user experience and consume more resource of service providers, we proposed a video compression and transmission method based on the content discrepancy of the files which have provenance relationships, named Provis. Provis uses two kinds of characteristics of videos’provenance, the first characteristic is that the provenance of videos can be used to rebuilding videos, the second one is that the provenance of videos which record the content discrepancy of videos having provenance relationships are smaller than videos’file, to store the videos’provenance to replace the stored videos for improving space efficiency, and to upload the videos’ provenance to replace uploaded videos for speeding up the videos uploading and reducing the net cost. Experimental evaluation based on two video sets show that Provis achieves significant space savings, and reduces the overheads of uploading videos, and the storage cost of provenance graph and the delay of video recon-struction are acceptable to users.In summary, this thesis mainly addresses the problem that the scale expand of cloud storage systems brings the new challenges to the performance of file search and manage-ment. We optimize the file access via mining the various correlations of the files which have provenance relationships, and propose a series of models and methods to improve the performance of file access in storage systems and provide the important theoretical and technical supports for the widespread use of provenance relationship.

Keywords/Search Tags:

Cloud Storage Systems, Provenance Relationship, Metadata Index, Video Com- pression, Cold Data, File Relocation

Related items

1	An Optimization Algorithm Of Cloud Storage Based On Data Dependency Relationship
2	Parallel Computation For File Metadata Cube In Cloud Computing Environments
3	Metadata Management For Parallel File Systems
4	Research On Parallel File Systems Based On Heterogeneous Hierarchical Storage
5	The Design Of The Secure-private-cloud-storage-based File Systemâ€™s Data Organization Model
6	Research On Data Provenance Scheme In Cloud Storage Based On Blockchain
7	Xml-based The Origin Calculated And Origin Storage Research
8	The Design And Implementation Of A Cloud Storage Service Based File System
9	Research On Distributed File System Based On Dynamic Multiple Center On Cloud Stroage
10	Metadata Management Optimization In Distributed File Systems