Font Size: a A A

Research On The High Efficient Storage Management Of Provenance And Its Application In Security Area

Posted on:2014-07-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y L XieFull Text:PDF
GTID:1228330425973346Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the data increasingly and explosively generated in the world every day, the de-mand for the storage capacity has become from PB(Petabyte), EB(Exabyte) to "Big Data". Though new devices and instruments are in the continuous development, and new storage architecture is also constantly proposed, the analysis and understanding of the massive data has become stagnant. For instance, when we get some important data from the cloud, we may ask, where are these data from, who has once used them, are they reliable and secure?Provenance, as a kind of metadata that contains the history information of a data object, can be exactly used to answer these questions. For instance, how is a data object created, which kind of operation has modified it, what is the difference between the ancestors of two data objects? In the system area, the provenance of a data object is all the processes and data that affect the final status of the data. Provenance discloses the history or generating process of a data object precisely. This makes the provenance more widely used. Now, provenance has been used to validate the experimental data set by the scientists, to improve desktop search efficiency, and to analyze system intrusion. It is also being used in the areas of audit, deduplication, and distributed security. However, few study focuses on analyzing the characteristics of provenance. For instance, a major feature of provenance is its large size, but very few good compression algorithm is developed. In addition, provenance records the history of data generation, but few research is focused on using it to ensure data reliability or analyze system intrusion.This paper proposes a hybrid approach that combines web compression and the dictio-nary encoding to compress provenance efficiently. This hybrid method utilizes the similarity between provenance graphs and web graphs, and is designed to fully explore the locality and similarity characteristics among the provenance graphs nodes, as well as eliminate repeti-tive strings inherently in the provenance information. Compared to the previous method, this hybrid method can compress the edge information in the provenance graphs, has much finer granularity and can support high efficient query. The experimental results indicate that this method achieves the best tradeoff on compression ratio, compression time, and query performance when compared with other compression methods.This paper proposes a provenance-based rebuild method that focuses on rebuilding a single object and can rebuild files in parallel and priority-based mode. It can accurately rebuild the lost or broken files by using provenance to backtrack the data generation pro-cess. Compared to the previous data storage solutions (for instance, log files, snapshot, backup, or ECC) that focus more on the hard disk or system security, the main advantage of using provenance is that the rebuild can be on a single data object, in parallel model on multiple objects, and can be based on priority. Provenance-based rebuild system can collect provenance when the file is read or written normally, can rebuild file automatically when the file is lost or damaged, and recover other files affected in the rebuild process. The experi-mental results show that, provenance-based rebuild performance is significantly better than the log-based reconstruction performance. Although there are various factors that affect provenance-based rebuild performance, they do not affect too much.This paper proposes to use provenance to detect intrusion. Through collecting the provenance of process that interacts with system, provenance-based intrusion detection method can determine the detailed behavior patterns of the invasion process that performs access and modification on the file, thus conveniently judging whether the system has been invaded or not, and identifying system vulnerabilities. This method overcomes the com-plexity and inefficiency of the manual analysis using conventional system or network log. In addition, as the log generally records only part of the information of the system event, for instance, the HTTP connection or Login records, the entire analysis process is very difficult. Provenance-based intrusion detection method, takes the network connection that interacts with the system as a file object, collects dependency provenance information be-tween system processes and files objects, and then constructs provenance graphs and find the intrusion path, so the administrator can analyze the invasion event on the intrusion chain and determine invasion sources. The experimental results show that provenance-based intrusion detection scheme has a much lower false detection rate and higher detection rate compared to previous methods. In addition, it has very small space overhead and has nearly no impact on the system performance. This paper proposes to use the object-based active storage technology to significantly optimize the provenance processing and its transfer performance on the network. The constantly generated large size of provenance makes it a big network bottleneck when provenance transfers in network-attached environment. The object-based active storage technology can solve this problem efficiently. On the one hand, the active storage technology offloads the provenance processing from the host to the storage device, thus greatly reducing the amount of provenance transmitted on the network. On the other hand, object-based storage devices have a more powerful processing capability than traditional block devices, and can process the provenance more intelligently and automatically. The ordinary data files and provenance database records are stored as user objects in the object storage device. The various data processing tasks are stored as function objects, and they will be scheduled to execute to complete a series of tasks, such as provenance compression, provenance queries, and data reconstruction. The experimental results show that the object-based active storage technology can significantly enhance the provenance-based rebuild performance.
Keywords/Search Tags:Provenance, Web Compression, Data Rebuild, Intrusion Detection, Object Stor-age, Active Storage
PDF Full Text Request
Related items