Font Size: a A A

The Big Data Processing Framework Base On Large-scale And High Dimensional Image Data

Posted on:2015-05-03Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y LinFull Text:PDF
GTID:2308330473951754Subject:Computer technology
Abstract/Summary:PDF Full Text Request
At present, digital image processing has been widely completed and applied to different fields. Meanwhile, with the popularity of cloud computing, bid data processing has attracted concerns. Some industry gaints have done some research, among which collecting, analyzing and processing the large-scale data are the most important task. More specifically, big data platform such as Hadoop has been implemented to exploit big data applications. However, a general framework for big image data is not available at present.This paper designed a big data processing framework based on Haddop MapReduce, an open-source platform, to process large-scale high-dimensional image data. At the same time, our proposed framework can implement image prcocessing algorithms that traditionally based on a single machine in a parallel manner, which lays a foundation for other complex algorithms. This research compensates the shortcomings of traditional image processing methods, such as undesirable time consumption, low efficiency of memory etc., and improves the performance.This research, based on taditional image processing on a single node and distributed image processing, mainly studied how to improve the large-scale and high-dimensional image processing efficiency without compromising the results performance. Specific work is presented as follows:1. Make a detailed description and analysis of the shortcomings existing in traditional image processing and current distributed image processing, inluding undesirable time efficiency of single machine, low efficiency of memory, and the drawbacks of current distributed processing on small image files. From these analysis, we find the entrypoint of this research and gain the design requirements of the framework.2. Considering the specility of large-sclae and high-dimensional image, this study designs a novle image data representation and storage model. We decode the original small image file to obtain some key information and store these information in the newly designed data representation, and then store the new representation in a big file storage model. Meanwhile, we construct index for the storage model and store the big file in the distributed system.3. We propose a new parallel image processing model. Based on the aforementioned image data representation and storage model, we combine MapReduce to design a new I/O, which helps to make the image algorithms processed in parallel and improve the time efficiency.4. We propose a low-latency scheduling framework. This framework can response the real-time requirements in low latency, and read the pre-defined parameters in the configuration file to match the requirements. If successfully matched, related processing algorithms will be weaken up and Capacity Scheduler will be used to improve the balance of the cluster.5. Based on the parallel image processing model, we implement two simple algorithms: Harris corner detection and SIFT feature extraction, which validates the feasibility and high efficiency of the parallel image processing model, and demonstrates the stability and pressure resistance of the low-latency scheduling framework.
Keywords/Search Tags:Big data, Image processing, Data representation, MapReduce, Distributed system
PDF Full Text Request
Related items