Font Size: a A A

Research Of Scene Image Classification On Spark Environment

Posted on:2019-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y JiFull Text:PDF
GTID:2428330566967588Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
The classification of scene images is a link which can not be ignored by various information retrieval and search engines.It has an important application in image processing and recognition in public security medical or remote sensing data.In this paper,the random forest algorithm is applied in scene image classification,which involves the pre-processing of image feature extraction and feature clustering.Through the comprehensive study of the algorithm,it is found that the Random Forest has a good classification effect and is suitable for parallel computing under the big data frame Spark.However,in the process of decision tree formation,the algorithm only uses a single criterion to divide the division of nodes,the classification accuracy of the random forest has not reached the best state.An improved Self-Adaptive Node Splitting Random Forest(SANS-RF)algorithm is proposed in this paper for improving the accuracy of image classification.The independent splitting method ID3 and CART are re-combined,and new splitting rules are obtained by adaptive parameter selection.On the basis of the Bag of Word model,the Spatial Pyramid Model is introduced to extract image features.Divide the image into different grids.Then using K-means algorithm to character clustering.Finally,it combines with the Spark platform's machine learning library and uses the improved random splitting algorithm to achieve image classification.The experimental results show that by properly selecting the number of Spatial Pyramid layers can effectively eliminate the influence of the image spatial location information on the extraction features.The SANS-RF algorithm can effectively improve the classification accuracy of the scene images and solve the defects of the node splitting method in the random forest algorithm,when the number of feature clustering and decision trees are obtained by experiments and the optimal coefficients are obtained by self-adaptive method.It can be combined with the Spark for parallel computing,which can further improve the efficiency of the algorithm.
Keywords/Search Tags:image classification, random forest, node splitting, spatial pyramid model
PDF Full Text Request
Related items