Font Size: a A A

Research And Implementation Of Distributed Provenance Storage System

Posted on:2017-02-27Degree:MasterType:Thesis
Country:ChinaCandidate:A Q PengFull Text:PDF
GTID:2308330485988472Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the rapid development of cloud computing and big data,storage and management problems of big data which put forward higher requirements for flexibility, scalability and concurrency of storage become the focus of attention. Many Internet applications leading to diverse unstructured data,and traditional relational databases which use two-dimensional tables to describeare data and relationship are unsuitable for flexible unstructured data. In this case, many new storage device and architecture like SSD,NoSql,distributed storage arise to improve the efficiency of storage and accessing and reduce the storage cost in the unstructured data application scenarios.People are always concerned about the lifecycle of some data, like when it is created,who used it,how many duplications it has, witch data referred to provenance information is significant for data management and system security. Provenance information describes the dynamic generative process of an object and interaction between objects,over time these data increase and relationships become more complex. So how to effectively describe and store large amounts of provenance information and allow users to access simply and effectively is the subject of this thesis.A high-performance distributed storage system called DBPS is designed in this thesis, DBPS adopts multi-level storage architecture including cache level and persistent level based on center-node distributed architecture. It sperates reading and writing cache on cache level and designs specific data structures and indexes which are provenanceperceptive, it uses a key-value database as the persistent storage engine to improve the efficiency of accessing and to save storage resources. Many provenance systems use existing databases like relational databases or graph databases for storage, data is highly processed at reading and writing witch leads to lower performance.The experimental results show that DBPS has high efficiency at creating and querying provenance objects,but lower efficiency at updating and deleting, however updating and deleting are rare in practice,so the overall performance and user experience of DBPS is prominent.
Keywords/Search Tags:provenance information, storage system, NoSql, key-value
PDF Full Text Request
Related items