Huge Amounts Of Digital Image Processing Platform Based On Hadoop

Posted on:2017-11-14

Degree:Master

Type:Thesis

Country:China

Candidate:B Wang

Full Text:PDF

GTID:2348330512969380

Subject:Signal and Information Processing

Abstract/Summary:

The rapid development of Internet makes the social media become active, therefore the Internet companies have accumulated huge amounts of digital image data, moreover remote sensing images and medical images are constantly generated every day. The method of small batch image processing by a single server can not cope with the increasing image data, so the Internet companies and scientific research institutions have to face the problem that how to process huge images effectively and mine the value from image data.In the past, to analyze the huge images we must sample some images from the whole image set. But for the data mining task, sampled images will loss a lot of important information because of the complex content and low information density, thus processing images using the whole image set becomes the urgent demands. The distributed storage and computing of cloud computing can provide solutions for processing huge images. Hadoop is an excellent open source platform that can process large data set. The key point of this thesis is how to design an image processing platform based on Hadoop platform.The major work of this thesis is as follows:1. This thesis extends the basic data type of Hadoop and designs two new data types that can store image file. Combine with OpenCV, the new data types can provide rich functions for users which simplify the image processing. According to the different demands of image processing, this thesis also designs the image input format and output format. With the help of above work, users can process huge images on the Hadoop platform by MapRedcue jobs.2. Hadoop files support write-once and read-many mode but they do not support random modification. If we want to delete some key-value pairs in SequenceFile, firstly we should traverse the entire file and remove the key-value pairs to be deleted to generate a new SequenceFile, then we use the new SequenceFile as the input of MapReduce. This thesis realizes an improved SequenceFileInputFormat that can delete user defined key-value pairs from SequenceFile and make the rest key-value pairs as the input of MapReduce while the SequenceFile does not be changed.3. This thesis designs a platform that can process and mine huge images. In addition, several scenes of image processing are introduced and realized based on this platform. Finally the feasibility and performance of the platform are tested.

Keywords/Search Tags:

Hadoop, huge image processing, cloud computing, file input and output format

Related items

1	Research And Implementation Of Methods For Parallel Processing Of Face Image Recognition Based On Hadoop
2	Research On The Key Technologies Of Cloud Computing Platform Hadoop
3	The Research On Massive Small Files Processing Under The Hadoop
4	Cloud Computing Network Printing Devices And File Format Conversion Algorithm To Achieve
5	Based On The Hadoop Mass File Storage System Analysis And Design
6	Research On Management Of Logistics Massive Data And Its Application In Cloud Environment
7	Research Of Medical Data Processing Technology Based On Cloud Computing
8	Research Of Digital Museum Architecture Based On Hadoop
9	Research On Medical Data Processing Technology Based On Hadoop
10	The Study Of Massive Image Processing Based On Cloud Computing