Study On The Optimization Method Of Massive Medical Image Data Processing Based On Hadoop

Posted on:2015-05-08

Degree:Master

Type:Thesis

Country:China

Candidate:Y N Wang

Full Text:PDF

GTID:2208330428978592

Subject:Communication and Information System

Abstract/Summary:

PDF Full Text Request

With the rapid development of science and technology, modern medical diagnosis has become increasingly inseparable from medical image. Medical image does not only help medical staff quickly determine the cause, but also in areas such as scientific research, it plays an important role. Because the popularity of related medical equipment, various imaging methods have been increasingly applied to the medical examination, which leads to the medical image data obtained are presented as explosive growth, the problem on how to store and process medical image information has already become be urgent. Traditional PACS, image storage and communication systems, is to manage the images produced by digital medical equipment, and many large hospitals are currently using it to store and process the patientâ€™s image data, but with the rapid expansion of image information and increase of patient demand for medical diagnosis, the traditional PACS system has begun to expose its own shortcomings, such as the high cost of the construction, lack of performance and scalability, which requires that we find new ways to store and process the massive medical images data.With the development of distributed systems, in2004, Google firstly introduced GFS and MapReduce to the world, and Hadoop is generated in this case. Hadoop is an open source software of Apache, which is a computing software platform infrastructure including HDFS and MapReduce framework, and its appearance helps enterprises solve the problem of storing and processing massive data, which gets more and more love from lots of enterprises. Application of this project is launched and based on " Ophthalmic Image Services Key Technology Research ", the main research part of the paper is how to use Hadoop to solve the problem of storage and processing of massive medical images, but there are still the following questions that Hadoop stores medical image files:1taking up a lot of memory space;2the efficiency of retrieving small files is very low, and the speed of accessing a large number of small files is much less than accessing several large files as the same size as small files;3HDFS is not suitable for real-time applications for low latency, and its performance of writting files is much lower than the reading performance. Therefore, in order to solve the above problems, this paper analyzes the structure of HDFS and job mechanism of MapReduce, and studies the two programs for solving the small files and its inadequate, then combined with practical, this paper puts forward our own solutions. Innovation of this paper is as follows:1. On the basis of studying the medical image DICOM standard and SequenceFile, and according to the problem that Hadoop processes small files, this paper puts forword and designs a new sequence medical image format----SF-DICOM to solve the shortcomings which that HDFS storing the massive small files consumes much memory of NameNode;2. The DICOM files are merged as time and the corresponding algorithm is designed;3. On the basis of Trie, this paper constructs secondary index mechanism and establishes the internal mapping between the DICOM files and SF-DICOM files to solve the low efficiency problem that the SequenceFile randomly reads DICOM files;4. According to design, this paper builds Hadoop experimental environment and develops the appropriate verification system to verify the feasibility and efficiency of the design.

Keywords/Search Tags:

Hadoop, DICOM, SequenceFile, Trie Tree, Small Files

PDF Full Text Request

Related items

1	Research On Processing Techniques Of Massive Small Files Based On Hadoop
2	Research Of Improving Storage Of Replica And Small Files Merging And Access Optimization On Hadoop Platform
3	Processing Of Small Files Based On HDFS And Optimization And Improvement Of The Performance For Mapreduce Computing Model
4	Design And Implementation Of The Key Techniques For Storing And Retrieving Massive Small Files In Hadoop
5	Research On Access Optimization Of Small Files In Hadoop Cluster
6	Research And Optimization Of Small Files Processing Techniques In Hadoop
7	Research And Implementation Of Small Files Storage Management Based On Hadoop
8	Study On Processing Of Massive Small Files Based On Hadoop
9	Research On Performance Optimization And Its Reusability For Managing Massive Numbers Of Small Files
10	The Research And Implementation Of Method Regarding To The Small Files Problem Of Hadoop