Font Size: a A A

Huge Amounts Of Digital Image Processing Platform Based On Hadoop

Posted on:2017-11-14Degree:MasterType:Thesis
Country:ChinaCandidate:B WangFull Text:PDF
GTID:2348330512969380Subject:Signal and Information Processing
Abstract/Summary:
The rapid development of Internet makes the social media become active, therefore the Internet companies have accumulated huge amounts of digital image data, moreover remote sensing images and medical images are constantly generated every day. The method of small batch image processing by a single server can not cope with the increasing image data, so the Internet companies and scientific research institutions have to face the problem that how to process huge images effectively and mine the value from image data.In the past, to analyze the huge images we must sample some images from the whole image set. But for the data mining task, sampled images will loss a lot of important information because of the complex content and low information density, thus processing images using the whole image set becomes the urgent demands. The distributed storage and computing of cloud computing can provide solutions for processing huge images. Hadoop is an excellent open source platform that can process large data set. The key point of this thesis is how to design an image processing platform based on Hadoop platform.The major work of this thesis is as follows:1. This thesis extends the basic data type of Hadoop and designs two new data types that can store image file. Combine with OpenCV, the new data types can provide rich functions for users which simplify the image processing. According to the different demands of image processing, this thesis also designs the image input format and output format. With the help of above work, users can process huge images on the Hadoop platform by MapRedcue jobs.2. Hadoop files support write-once and read-many mode but they do not support random modification. If we want to delete some key-value pairs in SequenceFile, firstly we should traverse the entire file and remove the key-value pairs to be deleted to generate a new SequenceFile, then we use the new SequenceFile as the input of MapReduce. This thesis realizes an improved SequenceFileInputFormat that can delete user defined key-value pairs from SequenceFile and make the rest key-value pairs as the input of MapReduce while the SequenceFile does not be changed.3. This thesis designs a platform that can process and mine huge images. In addition, several scenes of image processing are introduced and realized based on this platform. Finally the feasibility and performance of the platform are tested.
Keywords/Search Tags:Hadoop, huge image processing, cloud computing, file input and output format
Related items