Font Size: a A A

Research On Key Technologies Of Inforamaton Lifycycle Management In Content Aware Storage System

Posted on:2012-04-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:X J NieFull Text:PDF
GTID:1118330335455065Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Intelligent storages need to integrate application layer functions into storage layer, such as self management, data security and information retrieval. However, Information Lifecycle Management (ILM) can not be integrated into traditional storage systems because they lack file-level information which will be needed in various stages of ILM. The Content Aware Storage (CAS), which is based on XAM specification, provides supports for such intergration. By wrapping file-level information into content metadata, CAS can provide complete computing information for data processing in ILM, which provieds the basis for integrating ILM into storage systems.The paper proposes several key technologies that are involved in integrating ILM stages into CAS, including information integration, content classification, tiered storage, data backup and information archival. The main work includes:Propose an information integration model based on content metadata. Propose content metadata specification based on requirements of ILM, which includes the definition, extraction, representation and transportation of content metadata. The information is integrated in the form of both outer format and inner semantic. Design and develop a prototype of CAS that supports the content metadata specification. The experiment result shows that information integration degrades I/O performance very little.Propose a content metadata oriented information classification algorithm. Design a computing model for similarity between content metadata, which overcome the limitation of lacking enough character words. The model constructs a similarity matrix for characteristic words based on the explicit relations in train sample file collection, then calculates the implicit relations by matrix smoothing algorithm and obtains a set of linealy independent vectors, by which the characteristic vectors of content metadata are calculated. The data classifier is constructed based on the characteristic vectors and K-Means clustering algorithm. The experiment result shows that this classification algorithm can achieve higher accuracy and mutual information than traditional classification algorithm, and significantly reduce the computing time.Propose a content-metadata-driven tiered storage model, including application requirement based tiered storage and cost requirement based tiered storage. The former is to satisfy application requirements of information, such as backup, archival, security and access control, and the latter is to reduce storage cost, while guarantee the overall I/O performance. Propose an adaptive data migration algorithm based on migration speed control, which minimizes the negative impact of migration I/O on normal I/O. The experiment result shows that the model can effectively guarantee that the tier computing and data migration will not degrade performance of storage system, while reduce the storage cost of information.Propose a data de-duplication algorithm based on content characteristics. By introducing candidate chunk boundary histogram, the algorithm takes into account the difference between different file types, and optimizes the key parameters of traditional de-duplication algorithm based on candidate chunk boundary histogram. The key idea of this algorithm is to trade the redundancy among files of different types for that among same types. Propose a file system TDFS to storage the various length chunks. The experiment result shows that the algorithm can improve the data compression ratio on average by 9.0% on some special data sets.Propose an information archival model based on content metadata. By introducing content metadata tags that support OAIS specification, the model achieves the logical preservation of information. By modifying disk functions and the response of iSCSI commands, the model achieves a disk-based soft WORM and physical preservation of information. Propose a key based security destruction of information, which encrypts archival information and delete the key when preservation is overdue. Propose a time-based management for encryption keys, which significantly reduces the complexity of keys management. The experiment result shows that the archival model can satisfy the requirements of both function and performance.As experiments show, the content aware storage system can effectively resolve the problem of lacking file-level semantics in traditional storage system. By constructing key data processing stages of ILM based on content metadata, the complexity of integrating ILM into storage systems can be greatly reduced, and the data I/O performance can be improved, which satisfies the requirements of intelligent storage systems.
Keywords/Search Tags:Information Lifecycle Management, Content Aware Storage System, Content Metadata, Information Integration, Content Classification, Tiered Storage, Data De-duplication, Information Archive
PDF Full Text Request
Related items