Font Size: a A A

Design And Implementation Of Unstructured Data Storage System Based On Swift

Posted on:2019-01-02Degree:MasterType:Thesis
Country:ChinaCandidate:M M DaiFull Text:PDF
GTID:2428330599977707Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of the times,unstructured data has exploded.According to a survey report of the IDC,unstructured data accounts for 80% of corporate data and is still growing at a rate of 60%.Therefore,the processing and storage of massive amounts of unstructured data has become an urgent problem to be solved,and cloud storage technology has become a hot technology to solve this problem.The existing cloud storage system is mainly designed for large files and does not consider the correlation between data.However,the file size of most unstructured data is less than 2M,so the storage efficiency for unstructured data is still relatively low.This paper takes unstructured data as the research object,and based on OpenStack's object storage service Swift,an open source cloud computing platform,proposes an object-based storage method for storage design and read design of unstructured data.The design is integrated into Swift's storage platform to enable the development of unstructured data storage systems and improve unstructured data access performance.Therefore,it can be shown that the method studied in this paper can effectively improve the storage efficiency of unstructured data.For the problem of unstructured data storage,this paper studies the correlation between data,then uses the associated data as input,and clusters the data based on the BIRCH algorithm to achieve data grouping.The test results show that the memory usage of the ProxyNode does not exceed 18 MB,and the memory usage of the StorageNode does not exceed 15 MB.Compared with the original Swift system,memory usage is reduced by about 86%.When processing about 10 KB of data,the response time of the request is reduced by almost 50%.For the problem of unstructured data reading,this paper studies the mapping information between data and grouping and implements three mapping tables: DataIndex table,IndexCache table,and WorkingSet table.Then based on the access history,the data that is likely to be accessed next is pre-fetched into the cache of the proxy node.The test results show that when the number of user requests is 50,the hit rate reaches 50%.When the number of user requests exceeds 350,the hit rate exceeds 80%.At the same time,the response time for reading data is reduced,and the average response request delay is less than 20 ms.Therefore,it can be shown that the method studied in this paper can effectively improve the reading efficiency of unstructured data.Based on the above research results,this paper designed and implemented a prototype system for unstructured data storage based on Swift.The system combines storage technology and read technology for unstructured data.The test results show that this system can efficiently solve the storage problems and read problems of massive unstructured data.
Keywords/Search Tags:OpenStack Swift, unstructured data, data storage, data reading
PDF Full Text Request
Related items