Font Size: a A A

High Dimensional Multispectral Data Classification By Machine Learning

Posted on:2003-04-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:J T XiaFull Text:PDF
GTID:1118360092466155Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
The theories and methods for high dimensional multispectral data classification with limited training samples are studied,which are parts of important research contents of National 863 Hi-Tech,973 Project and Ministry of Education PhD Fund. With the development of sensor technology,multispectral sensor can collect data in as many as several hundred spectral bands at once. The high dimensional multispectral data,which features high spectral resolution,high spatial resolution,and large dynamic range,have provided luxuriant information about earth surface for people. Because the number of training samples is limited and data dimension is high,the performance of traditional pattern classification algorithms is deteriorated. In this thesis,several issues concerning the machine learning and the classification of high dimensional multispectral data with limited training samples are addressed,which are based on statistic learning theory (SLT),support vector machine (SVM) and artificial neural networks (ANN). The main work and results are outlined as follows:1. The characteristics of high dimensional multispectral data are studied,and the difficulties that deteriorate the performance of the traditional pattern classification algorithms are carefully analyzed. Applying statistic learning theory and support vector machine in high dimensional multispectral data classification,the Hughes phenomenon is mitigated and higher classification accuracy is obtained. The relation between the performance of SVM and kernel function,support vector,training set,data dimension and so on is studied.2. A fast SVM training algorithm based on boundary samples selection is proposed (BSS-SVM). This novel algorithm selects boundary samples from training samples to train SVM,instead of using normal training samples. Thus the scale of the training data set is reduced greatly and the training speed of SVM is improved enormously. Because the decision boundary of SVM is only determined by support vectors,the classification accuracy is almost preserved when other samples are omitted. A newboundary samples selection algorithm via fuzzy clustering (FCMBSS) is brought forward to accelerate the boundary samples selection speed.3. SVM is designed for binary classification problem. It cannot be used to solve multiclass problem directly. A new SVM framework (ECC-SVM) to handle multiclass problem is proposed,which uses the error correcting codes to reduce the multiclass problem to multiple binary problems. Every class endues a binary code,then a set of SVMs are used to solve the multiple binary problems. The generalization performance of ECC-SVM is analyzed,which is determined by code length,Hamming distance,coding sequence and margins of SVMs. 1-v-R SVM,which is widely used to solve multiclass problem,is equivalent to ECC-SVM with a set of special codes. So the generalization performance of 1-v-R SVM is also analyzed.4. Though Double Parallel Feedforward Neural Networks (DPFNN) has been successfully used to classify the multispectral images,the generalization performance of DPFNN has not been studied until now. This thesis studies the relationship of the generalization performance of DPFNN and weight values in theory. The result shows that the weight values of output layer neurons control the generalization performance of DPFNN. Based on this result,a new approach for improving the generalization performance of DPFNN is proposed,which regularizes the output layer weights during DPFNN learning process. The new algorithm can be used to training other multi-layer feedforward neural networks,which can improve the generalization ability of them greatly.5. Anew feature extraction algorithm in kernel space is proposed,which uses the Bhattacharyya distance as its criterion function. The data is nonlinearly mapped into high dimensional kernel space at first. Then a set of feature vectors can be found such that the Bhattacharyya distance of the classes mapped into lower dimensional feature space by feature vectors is maximized. Thus...
Keywords/Search Tags:Remote Sensing, Multispectral, Machine Learning, Pattern Recognition, Generalization Performance, Statistic Learning Theory, Support Vector Machine, Double Parallel Feedforward Neural Networks, Feature Extraction
PDF Full Text Request
Related items