Research On Key Technologies Of Redundancy Elimination For Medical Big Data | | Posted on:2024-06-19 | Degree:Doctor | Type:Dissertation | | Country:China | Candidate:L Xiao | Full Text:PDF | | GTID:1524307310482284 | Subject:Computer application technology | | Abstract/Summary: | PDF Full Text Request | | The popularization of the Internet and the widespread adoption of intelligent devices have facilitated the development of medical big data.The explosive growth of medical big data has created enormous pressure on storage systems.How to efficiently reduce the storage and economic cost of medical data has become an urgent problem.Deduplication and delta compression can effectively eliminate redundancy in storage systems and improve storage utilization.However,the multimodality and complexity of medical data present more challenges to redundancy elimination.This thesis analyzes fundamental problems faced by redundancy elimination for medical big data and conducts in-depth research on redundancy elimination technologies to achieve secure and reliable redundancy elimination schemes with high scalability,high performance,low storage cost,and low resource cost.The main research works and innovations are as follows:(1)A secure deduplication scheme ESDedup based on data similarity is proposed to reduce the fragments caused by deduplication and improve the integrity verification efficiency of data with high duplicate blocks after deduplication recovery.First,because different types of medical data have different duplicate properties,ESDedup analyzes the redundant characteristics of medical data and proposes a similarity calculation method to determine the duplicate property of data.Then,ESDedup designs a block rewriting method based on maximum similarity to classify and rewrite data blocks for medical data that do not have the temporal locality,which eliminates fragments and improves the recovery performance of data.Finally,in order to efficiently verify the integrity of medical data after deduplication recovery,ESDedup proposes a fingerprint-based integrity verification method that utilizes the metadata of deduplication to design the auditing strategy,which significantly improves auditing efficiency.Experimental results show that ESDedup not only improves the recovery performance,deletion performance and integrity verification efficiency of data,but also improves the deduplication ratio by 55.9% compared with the current mainstream block rewriting method MFDedup.(2)A high-reliability redundancy elimination scheme MFRE based on multi-feature is proposed to reduce overlapped deltas generated by post-deduplication delta compression and improve the reliability of post-deduplication delta compression.First,due to the significant difference in storage benefits achieved by different redundancy elimination methods for diverse medical data,MFRE adopts a clustering method based on shared block relationships to identify the redundancy types of data,and utilizes hybrid redundancy elimination for diverse data to minimize storage overheads.Then,MFRE proposes dynamic post-deduplication delta compression based on similarity and dynamic delta compression based on the temporal locality,which minimizes overlapped deltas and improves the delta compression ratio by identifying more similar basic blocks.This method is also applicable to medical data that do not have the time locality and enjoys better scalability.Finally,MFRE designs a layered fault-tolerant method based on deduplication perception.According to the utilization of containers,different containers are used with replicas and erasure codes to ensure high reliability while reducing storage overheads and improving access performance.In addition,a load-balancing data placement strategy by considering node workloads further improves the performance of the method in heterogeneous environment.Experimental results show that MFRE improves the delta compression ratio and delta compression efficiency while maintaining high reliability,and the fault recovery performance is36% higher than that of the current mainstream high-reliability post-deduplication delta compression method Rep EC-Duet.(3)A secure lossless redundancy elimination scheme SLRE with semantic awareness is proposed to reduce the time overhead of hybrid redundancy elimination and improve the integrity verification efficiency of data with low duplicate blocks after de-redundancy recovery.First,SLRE utilizes similarity and information entropy to represent the redundancy characteristics of data.Then,content-defined proxy similarity and proxy entropy are proposed to reduce the time overheads of similarity calculation and information entropy calculation.Besides,SLRE adopts post-deduplication delta compression based on proxy similarity to set priorities for different redundant blocks and index similar blocks in sequence in accordance with the priority.Moreover,bloom filters are exploited to reduce the time overhead of indexing dissimilar blocks.Furthermore,SLRE designs LZ compression based on proxy entropy to filter high-entropy data and improve system throughput.Finally,SLRE utilizes parallel technology based on the Merkle tree concept to enhance the integrity verification efficiency for the data with low duplicate blocks.Experimental results show that SLRE significantly reduces the time overhead of redundancy elimination methods while improving storage efficiency and integrity verification efficiency. | | Keywords/Search Tags: | medical big data, storage systems, redundancy elimination, data deduplication, delta compression, data integrity, reliability | PDF Full Text Request | Related items |
| |
|