Font Size: a A A

Distributed Implementation Of Image Analysis Based On Hadoop

Posted on:2018-12-30Degree:MasterType:Thesis
Country:ChinaCandidate:S J GuoFull Text:PDF
GTID:2428330542988023Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Nowadays,image analysis and processing is a discipline which developing very rapidly.It applicating in various fields including medical,pattern recognition and image retrieval.Now with the development of Internet,the application of image analysis and processing in various fields towards the direction of interconnection.People's pursuit of information is no longer limited to words,but also a combination of various multimedia data,the result is a sharp increase in the amount of data.This is good for researchers doing some research,but with the large amount of data,there is a problem of the operation efficiency.In this paper,we use the most popular large data processing platform,Hadoop,to realize the distributed cluster of image features and distributed training of neural network by using the distributed computing idea of MapReduce.The tasks are performed on several low-configuration ordinary computers.Nodes communicate with each other.The integrated computing capacity is more than the same cost of workstations.It solved the problem of low image processing efficiency of general computer and the high cost of same configuration of workstation.In this paper,we created common configuration instances as working nodes on the Amazon EC2 service platform and built a multi-node Hadoop cluster to achieve the image data distributed processing.Based on the distributed idea of MapReduce,this paper proposes a method of automatic image segmentation,distribution,classification and multi-thread downloading based on the target class name,and transforms the single task problem into distributed parallel and multithreaded tasks.We used both distribution and parallelization improved the efficiency of image acquisition.Aiming at the low efficiency of KMeans clustering and neural network training,this paper proposes a distributed training algorithm of KMeans clustering and a distributed training algorithm of BP neural network with additional momentum based on MapReduce.And in this paper,the commonly used distributed implementation method is improved.We completed the task of the most time consuming part of iterative training in KMeans clustering and the most time-consuming part of forward and backward computation of the BP neural network is by the Map task in MapReduce.The tasks are allocated and started according to the number of data blocks.The Map task getting data from file to calculate.The last opration is completed by the method of reduce in Reduce task.We completed data training once through performing MapReduce task once.We added terminating and judging process in the external user process.The task according to the number of iterations and error conditions to terminate the MapReduce task or continue to execute the next times.Finally,this paper builds Hadoop cluster on EC2 platform of Amazon,uses the framework of MapReduce to implement the proposed distributed training algorithm,analyzes the time efficiency of different training data and iteration times,and sums up the accuracy and accuracy of the predicted values.The results show that the efficiency of distributed training algorithm proposed in this paper can be verified by comparing with the training results of single-machine with same configuration of computing nodes.
Keywords/Search Tags:image classification, MapReduce, Hadoop, BP neural network, distributed computing
PDF Full Text Request
Related items