Font Size: a A A

An Unstructured-data Query Optimization Oriented Storage System

Posted on:2012-09-14Degree:MasterType:Thesis
Country:ChinaCandidate:H LiFull Text:PDF
GTID:2218330362456479Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
In order to solve the issue of unstructured-data storage and provide data storage infrastructure for upper layer applications to improve the development, an unstructured-data query optimization oriented storage system has been designed and implemented. The system provides a unified, simple, transparent, secure data access interface and query methods, and effective organization and management of unstructured-data to guarantee low latency, high throughput and high availability data service.The main idea of unstructured-data query optimization oriented storage system is to learn from the data model and architecture ideas from Bigtable and make improvements. It uses {key: value} format to extract metadata from unstructured-data and build indexes, index information and unstructured-data are persisted to document database, meanwhile, REST architecture is adopted to provide operating system and programming language independent data access interfaces. The system tackles the complexity existed in Bigtable that data processing and control are implemented by users, the work is major focusing on following aspects: (1) Based on the NWR model, make balance between consistency and availability, and build hierarchical cache structure with fine-grained data scale. Meanwhile, pre-caching mechanism is established according to the relevance of data; (2) In order to guarantee eventually consistent, message queue is introduced to synchronize between cache and persistent storage, and it's also responsible for synchronization and backup in data server nodes to ensure the whole system is always available; (3) Digital signature authentication method based on URL is adopted to ensure the security, and finally, it provides complex query syntax to meet diverse users'requirements by translating the syntax in the key-value format to SQL-like statement.The storage system is implemented based on fully exploiting the system requirements and characteristics. Experiments show that even under heavy load conditions, it can provide stable data storage service. Compared with the file system where data are stored as file stream and the relational database system where data are stored as BLOB, the throughput and the capability of concurrent processing has been increased about 30%, and the response time is always maintained at 200ms.
Keywords/Search Tags:Data Storage, Unstructured-data, Query Optimization, Bigtable
PDF Full Text Request
Related items