Font Size: a A A

Relationships Between Evaluation Criteria Of Feature Selection And Analysis On Class Imbalance Problem Over Vhr Remote Sensing Imagery

Posted on:2012-11-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:X ChenFull Text:PDF
GTID:1118330362458360Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Very high-resolution (VHR) remote sensing images have been widely applied to many fields of society, such as national defense, agriculture, forestry, ocean, land resources, environmental monitoring and so on. Compared with medium and low resolution remote sensing images, VHR remote sensing images have rich shape, structure and texture information. Consequently, the conventional pixel-based methods cannot meet the needs to process VHR remote sensing images, and object based image analysis (OBIA), developed during the last decade, has become the mainstream method. However, the OBIA faces a great challenge to accurately describe the objects of images, except the well-known challenge in image segmentation. Specifically, the available high-dimensional features, such as texture and shape, extracted from VHR remote sensing images would deteriorate the performances of classifiers. Thus, feature selection is an important research content for OBIA on VHR remote sensing images.Feature selection methods in pattern recognition fileds often measure various data characteristics, such as separability, dependency and inter-feature correlation, while ignore the implicit mutual relationships between them. To address this issue, we measure data characteristics and derive their relationships based on filter models in feature selection. In addition, we analyze data characteristics in the supervised/semi-supervised learning on the sample sets of VHR remote sensing images to combat the class imbalance problem, involved in'feature analysis of VHR remote sensing objects'. The work of this dissertation can be divided into two parts:1. Criteria of separability, dependency and inter-feature correlation are presented to measure the corresponding data characteristics and analyze their relationships.2. In land cover/use classification, existing researches often utilize kernel tricks to deal with linear nonseparable sample sets of VHR remote sensing images and omit the class imbalance problem caused by skewed distribution, i.e., the ratios of sample sizes between large classes (large-size classes which contain the majority of samples) and small classes (small-size classes) are quite large, making the classifiers bias towards large classes. When the labeled samples are enough, we evaluate class separability, reduce the influence of linear nonseparability and class imbalance on classification from the perspective of supervised learning. When the labeled samples are scarce, we make use of the unlabeled samples and evaluate the intrinsic separability of the unbalanced sample set from the perspective of semi-supervised learning.The main innovations are summarized as follows:1. We present different types of evaluation criteria and analyze their relationships, in order to approximate various data characteristics and describe their implicit mutual influences. First, based on an assumption that samples and their conditional probabilities of classes have a multivariate normal distribution, we put forward three criteria, separability, dependency and inter-feature correlations criteria. Then their relationships are established in terms of Multiple Linear Regression for describing the mutual influences between different data characteristics. To verify the effectiveness of criteria, a na?ve search strategy, feature ranking, is used in supervised and semi-supervised feature selection. On the real-world datasets, compared with state-of-the-art feature selection methods, such as ReliefF and mRMR, original criteria yield comparable or better performances, demonstrating the application value of criteria. In order to verify the relationships between the criteria, a criterion is derived from others and the relationships between criteria. On the artificial dataset subject to normal distribution, as well as real-world datasets, the relationships between criteria are validated by the consistence of performances between original criteria and the derived criteria. The mutual influences between different data characteristics are revealed by the relationships between the corresponding criteria.2. A graph-based supervised feature selection method, named Locally Weighted Discriminating Projection (LWDP), is proposed to evaluate class separability. First, LWDP presents a generalized objective function as criteria to cope with possible nonlinear structure in feature space by locally analyses on unbalanced sample set inspired by the manifold thought. Second, dissimilarity rather than similarity, such as heat kernel, is employed to characterize the relationships between pairwise neighbors, for preservation of local structure and alleviation of the class imbalance problem. Finally, LWDP constrains the pairwise relationships between neighbors according to class size and local class distribution based on consistency assumption, and then introduces the constraints into the weight matrices to combat class imbalance problem and underlying noise. On the sample sets of VHR airborne remote sensing images comprising large, medium and small classes, LWDP simultaneously improves the accuracies of all of the classes, especially the small classes, and alleviates the negative influence of class imbalance problem, linear nonseparability and underlying noise, compared with the state-of-the-art and the latest feature selection methods.3. A graph-based semi-supervised feature selection method, named Asymmetrically Local Discriminant Selection (ALDS), is proposed to evaluate class separability. This method introduces prior knowledge of class sizes to design a function of asymmetric misclassification costs, and then locally explores multiple kinds of relationships between sample pairs with only a small number of labeled samples. Such techniques help to counter class imbalance, more accurately assess the ability of features in preserving the geometrical nature and discriminant structures, and enhance generalization ability for semi-supervised learning. On the sample sets of VHR airborne remote sensing images and QuickBird image comprising only large and small classes, ALDS simultaneously improves the accuracies of all classes, especially the small classes, compared with the state-of-the-art and the latest feature selection methods. The performances with prior knowledge exceed that without prior knowledge, especially on highly unbalanced sample sets.
Keywords/Search Tags:VHR remote sensing images, object-oriented methods, land use classification, feature selection, graph-based filter model, relationships among criteria, partial correlation analysis, class imbalance, constraints of neighborhood unions
PDF Full Text Request
Related items