| With the successive launch of high spatial resolution remote sensing satellites,such as IKONOS,QuickBird,WorldView series,GeoEye,GF-1/2,ZY-3,GJ-1,JL-1,etc.,high spatial resolution earth observation technology has been developed rapidly.Compared with low and medium spatial resolution remote sensing images,high spatial resolution remote sensing images can provide more detailed information(e.g.,structural,textural,and spectral information)and more accurate spatial distribution,which make remote sensing image interpretation become more accurate.However,due to the improvement of spatial resolutions,the interpretation of high spatial resolution remote sensing images is affected by the problems of complex background interference and variable structures of the ground objects.The intra-class variance increases while the variance between classes decreases,which brings great challenges to the interpretation of high spatial resolution remote sensing images.The traditional pixel-level and object-oriented classification methods can hardly meet the needs of high-level semantic interpretation of remote sensing images.Therefore,scene classification of remote sensing images has attracted extensive attention and has become an active research topic in the field of remote sensing.This thesis systematically summarizes the theories and methods involved in scene classification of high spatial resolution remote sensing images.Focused on the feature representation problem,this thesis studies the feature representation methods from three aspects,i.e.,shallow feature representation,mid-level feature representation and deep feature representation.In addition,multiple feature integration methods are also considered to combine the complementary features with high descriptive ability,which can achieve higher classification accuracies compared with single features.The main contents and contributions of this thesis are as follows:(1)This thesis constructs a large-scale dataset for scene classification of high spatial resolution remote sensing images.The commonly used remote sensing scene classification datasets are in small scales,which contain only one to two thousand sample images,and the scene types are not rich enough.Thus,they can not reflect the distribution of remote sensing scenes in the real world,and the scene classification methods can not be tested and evaluated accurately,which severely limits the development of the scene classification algorithm,especially the deep learning-based methods.To overcome this problem,this thesis studies the existing datasets,investigates the land-use/land-cover classification standards worldwide,and considers the practical difficulties of data annotation,to set up a reasonable scene classification category system.Then,a large-scale dataset is constructed by crowdsourcing method,which can be used as a standard dataset for testing scene classification algorithms and provide data support for relevant research.(2)This thesis systematically studies and analyzes three different levels of feature representaion methods on scene classification,i.e.,shallow-level,mid-level and deep-level feature representation.In addition,this thesis utilizes a random sampling method to extract the information from multiple ground objects in a scene image,and proposes an improved Bag-of-Visual-Words model based on mining the spatial distribution of visual words for mid-level feature representation,which can effectively improve the accuracy of scene classification with high efficiency.This thesis summarizes and programs some commonly used feature representaion methods,and compares them on the large-scale dataset named AID constructed in this thesis as well as other commonly used datasets.All the codes and experimental results have been published,which can be used as a benchmark for scene classification of high spatial resolution remote sensing images.Based on the above study and analysis,this thesis utilizes a random sampling method to extract the information from multiple ground objects,and proposes an improved Bag-of-Visual-Words model based on mining the spatial distribution of visual words to incorporate the local relative spatial information,which can improve the classification accuracies of mid-level feature representation methods with high effiency.In addition,this thesis also introduces a deep feature representation method based on the transfer of deep CNN,which can effectively improve the descriptive and discriminative ability of deep features.(3)This thesis proposes a wapper-based feature selection method to obtain multiple feature represetation on the "shallow-middle-deep-multiple" level respectively,which can adaptively select the features with complementary information for each dataset,describe the image from different aspects and different levels,and obtains the state-of-the-art results.The wapper-based feature selection method combines the feature selection procedure with the classification procedure and uses the classification accuracy as the measure of the importance of each feature.Such a waapper-based feature selection method can adaptively select the features with complementary information for each dataset,and construct the multi-level feature representaion which can describe the image from different aspects and different levels,and obtains the state-of-the-art results. |