Font Size: a A A

Storage And Retrieval Of Medical Image Files Based On Hadoop

Posted on:2020-08-14Degree:MasterType:Thesis
Country:ChinaCandidate:W J ChenFull Text:PDF
GTID:2404330623956742Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
PACS system is a system applied in the hospital imaging department.The main task is to preserve all kinds of medical images produced in daily life in a digital way.In recent years,although PACS system has made great progress,it still does not solve the problem of unified storage architecture.Especially,with the increasing amount of data stored,using traditional relational database as the storage scheme has some problems,such as inefficient query and retrieval,unable to obtain retrieval results in a short time,and so on.Aiming at the limitation of the existing medical image file storage and retrieval system in large amount of data,this paper improves the original PACS system,and proposes a storage and retrieval scheme based on Hadoop distributed file system and HBase distributed database,aiming at improving the storage and retrieval efficiency of medical image files.In order to achieve the above purpose,this paper mainly does the following research:(1)This dissertation presents a scheme to optimize the performance of HDFS for storing large and small files.By merging small files into large files,all medical image files in the same sequence are merged into a SequenceFile,and then the SequenceFile with the same check number is merged into MapFile.In this way,the merged files are generally large in scale,which can satisfy the high performance processing of HDFS for large data modules.(2)A storage scheme of DICOM image file is proposed.It parses the hierarchical information of DICOM format files,stores these information into one of the column families of HBase,and uses one column family to store the address of image source files on HDFS to support retrieval.dcm4che3 is used to extract the required information from DICOM format image files.At the same time,the proprietary format of image file is constructed,and the mapping model from relational database query to non-relational database task is established through Map/Reduce task flow to better adapt to Map/Reduce job flow.(3)For the optimization of bloom filter algorithm,the purpose of setting the index is for fast query.Multi-field retrieval is fundamentally to find out some matching fields in a large set.Therefore,we adopt bloom filter algorithm as the main idea.However,the traditional bloom filter algorithm has a certain error rate,so we effectively reduce the error through the improved bloom filter algorithm.(4)In order to verify the practical application effect of the scheme,this dissertation carries out application research.Hadoop storage architecture and HBase distributed database are used to store and retrieve the image files produced by the image Department of a hospital.After the analysis of the experimental results,compared with the traditional MySql storage architecture,the performance gap is not obvious when the data volume is small,but with the increase of the data volume,the scheme in this dissertation has obvious advantages.Especially in the retrieval experiment,the secondary index scheme of Bloom filter and improved HBase is helpful to improve the data query time.
Keywords/Search Tags:Medical Images, Hadoop, HBase, Bloom Filter Algorithm
PDF Full Text Request
Related items