Font Size: a A A

Study On The Optimization Method Of Massive Medical Image Data Processing Based On Hadoop

Posted on:2015-05-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y N WangFull Text:PDF
GTID:2208330428978592Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the rapid development of science and technology, modern medical diagnosis has become increasingly inseparable from medical image. Medical image does not only help medical staff quickly determine the cause, but also in areas such as scientific research, it plays an important role. Because the popularity of related medical equipment, various imaging methods have been increasingly applied to the medical examination, which leads to the medical image data obtained are presented as explosive growth, the problem on how to store and process medical image information has already become be urgent. Traditional PACS, image storage and communication systems, is to manage the images produced by digital medical equipment, and many large hospitals are currently using it to store and process the patient’s image data, but with the rapid expansion of image information and increase of patient demand for medical diagnosis, the traditional PACS system has begun to expose its own shortcomings, such as the high cost of the construction, lack of performance and scalability, which requires that we find new ways to store and process the massive medical images data.With the development of distributed systems, in2004, Google firstly introduced GFS and MapReduce to the world, and Hadoop is generated in this case. Hadoop is an open source software of Apache, which is a computing software platform infrastructure including HDFS and MapReduce framework, and its appearance helps enterprises solve the problem of storing and processing massive data, which gets more and more love from lots of enterprises. Application of this project is launched and based on " Ophthalmic Image Services Key Technology Research ", the main research part of the paper is how to use Hadoop to solve the problem of storage and processing of massive medical images, but there are still the following questions that Hadoop stores medical image files:1taking up a lot of memory space;2the efficiency of retrieving small files is very low, and the speed of accessing a large number of small files is much less than accessing several large files as the same size as small files;3HDFS is not suitable for real-time applications for low latency, and its performance of writting files is much lower than the reading performance. Therefore, in order to solve the above problems, this paper analyzes the structure of HDFS and job mechanism of MapReduce, and studies the two programs for solving the small files and its inadequate, then combined with practical, this paper puts forward our own solutions. Innovation of this paper is as follows:1. On the basis of studying the medical image DICOM standard and SequenceFile, and according to the problem that Hadoop processes small files, this paper puts forword and designs a new sequence medical image format----SF-DICOM to solve the shortcomings which that HDFS storing the massive small files consumes much memory of NameNode;2. The DICOM files are merged as time and the corresponding algorithm is designed;3. On the basis of Trie, this paper constructs secondary index mechanism and establishes the internal mapping between the DICOM files and SF-DICOM files to solve the low efficiency problem that the SequenceFile randomly reads DICOM files;4. According to design, this paper builds Hadoop experimental environment and develops the appropriate verification system to verify the feasibility and efficiency of the design.
Keywords/Search Tags:Hadoop, DICOM, SequenceFile, Trie Tree, Small Files
PDF Full Text Request
Related items