| With the development of Internet and multimedia technology,large-scale digital image and data has come into Internet and change our lives. It’s very important to let computer understand image’s content in detail. Intergrated with other subjects like machine learning,artificial intelligence and etc,the goal of image classification technology is to categorize a large number of digital images into a certan class automatically. Image classification is very important to solve image understanding problem.Random forest algorithm is based on decision tree model.It is made up of a series of decision trees and classification result is better.It is widely used in object classification field.It also provides a new way to manage the image classification problem.But, it is a time-consuming process on massive image data.To solve the problem mentioned above,this thesis studys the parallelization of image processing based on Hadoop platform and parallelized random forest algorithm with MapReduce scheme.The main work of the thesis is summarized as below:Firstly,this thesis introduces the Hadoop platform.It includes HDFS distributed file system and MapReduce framework.Secondly,this thesis discusses the key technology of image classification.This part summarizes the commonly used features and focus on scale invariant feature transform and Bag of Visual Words model.And several important pattern classification methods has been introduced lately.Thirdly,the parallelization of random forest algorithm has been studied. With the double parallelization of random forest algorithm,the operational efficiency of the algorithm has been improved. Based on previous work,this thesis build an image classification system with hadoop image processing interface,dense sift extraction,bag of words model,build spatial pyramid and random forest classifier training part.Finally,the experiments show that the algorithm can improve the time and performance.Parallelized image processing method based on Hadoop platform greatly increased the speed of processing massive image data.And Random Forest parallelization method also improve the efficiency of image classification. |