Font Size: a A A

Extracting Method Of Biological Data Feature Based On Manifold Learning

Posted on:2012-11-30Degree:MasterType:Thesis
Country:ChinaCandidate:X F XingFull Text:PDF
GTID:2178330335979725Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of information technologies.It is obtained directly from experimental data that is growing exponentially and mixes with a lot of uncertain and redundant information, so it is difficult to process the experimental data directly. Researching the character of biological data is very significant, because it not only help to speed up the processing of biological data and improve the accuracy of biological data, but also for biology, medicine and pharmacy in the inquiry have a very important role. At present, the data obtained directly from computer vision, gene microarray data analysis and biometric data is high-dimension. Therefore, it is a very important subject that how to get the data effectively from these high-dimension data to extract valid information for information science and technology.This paper is to study how to extract useful information from biological data of high-dimension small sample characteristics making it low-dimension small sample and how to build a biological data feature classification model to enable more accurate and effective identification of different types of biological data. The content included feature extraction methods of biological data, structure design of neural network and the choice of manifold learning algorithms.1. Feature extraction of Biological data. To classify the characteristics of biological data, we must first extract the characteristic information of biological data, converted it into the data that a computer can handle, and then extract feature from the large amount of high-dimension, select the main features and remove redundant and irrelevant features. Therefore, how to extract the main features of biological data and how to choice the extraction method are critical, and different feature extraction methods have different biological characters. Now the main methods of feature extraction are composed of linear feature extraction methosds such as principal component analysis (PCA) and independent component analysis (ICA), nonlinear feature extraction methods such as nonlinear PCA network and Kohonen metch, manifold learning such as isometric mapping (Isomap), locally linear embedding (LLE) and nonnegative matrix factorization (NMF) and so on. This paper used the above feature extraction methods. Experimental results showed that the different feature extraction methods for different data sets and classification models have different results.2. Building of classification model. In this paper we used colon dataset and leukemia cancer dataset, and the substance of the classification and prediction are based on the useful information that extracted from the front data. We summarized a certain rule by analyzing the relationgship between those information, the nidentify the unknown type of forecast data. These two datasets after the feature extraction are still relatively high dimension and computation is relatively large, so it is very necessary for us to use neural network. Neural network has a relatively strong self-organization, self-learning, adaptive ability to quickly and efficiently learn the features that contained in the sequence feature information on the types of training and implementation forcast. In addition to the neural network has a good fault tolerance. To achieve the structure prediction, the neural network optimization includes the structure optimization and parameter optimization. It is very important to choice optimization algorithm, because different optimization algorithms have different time efficiency, and different optimization algorithms correspond to different predicting accuracy. For this feature of neural networks, we used different optimization algorithms to optimize neural network, and selected a more suitable optimization algorithm for the two datasets. Experimental results showed that the BP neural network to a certain extent can improve the forecast accuracy of the data, and the experiment also showed that a single output neural network can achieve better results than multi-output neural network.
Keywords/Search Tags:Isomap, Locally Linear Embedding (LLE), Nonnegative Matrix Factoriization (NMF), Back Propagation Neural Network (BPNN)
PDF Full Text Request
Related items