Font Size: a A A

Research On Method Of Feature Extraction For Biological Data And Its Application

Posted on:2013-05-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:A YangFull Text:PDF
GTID:1260330425983966Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of high-throughput technologies, a flood of biomedical data has come into being. One of the most challenging biological problems we are facing in the post-genome era is how to excavate significance biological knowledge and law from massive biomedical data. With the booming of sequence data, the function of genes and proteins involved in important life activities still remains unknown. It is difficult to discover functional information of genes and proteins from the data itself due to the complexity of the biological data and the difference of evaluation criteria existed in different research areas. And thus people began to mine the rule of bioinformatics data by means of feature extraction. Feature extraction is the most fundamental problems in bioinformatics, and the quality of feature extraction algorithm is directly related to the accuracy of information extraction and analysis of biological data. Based on gene and protein data, features extracted from gene and protein data are explored in more depth in this paper. According to the characteristics of the data itself and its application background, we propose three different feature extraction algorithms and meanwhile verify the accuracy and reliability of the methods. This paper is summarized as follows:(1) Protein feature extraction is the basis of the protein associated application problems, feature extraction is one of the effective extraction of the main factors affecting protein characterized incomplete. This paper presents a mixed feature-based sequence features extraction method for the problem. The methods are to construct a vector through the use of some protein sequence feature information, formed as a protein characteristic vector, based on the method of the feature vector as the input of SVM or KNN classifier to predict the exact localization of the protein subcellular, in this article. By comparison with other protein subcellular localization method based on sequence information, the method can automatically in the case did not know the knowledge of protein structure in advance. From the analysis of the experimental results the proposed method is superior to other methods in accuracy, correctness and effectiveness of this approach.(2) Still not enough protein feature extraction methods for most of the researchers extracted emphasis on local information, which makes the tectonic characteristics. A series digital feature extraction method is proposed in this paper for this problem, the method to ignore the protein’s structure and interaction information, to construct a vector based on the hydrophobicity, polarity, charge and other characteristics as the feature vector of the protein. Obtained by this method characterized in protein sequence contains both the global information, and encompasses the sequence of local information. This article extract protein sequence feature vectors and combined with nearest neighbor classifier (KNN) algorithm to predict protein function classification, to address the protein functional class prediction problem no or little interaction information. In order to discuss whether the subcellular localization of information issues that affect protein function prediction subcellular location information into the mentioned features, and for protein function prediction experiments show the effect in some respects superior to other methods. It also confirms the effectiveness of the proposed method.(3) This paper presents a new feature gene selection algorithm, and applied to the tumor subtype recognition. This method taking into accounts both filtration and wrap method of feature selection. First of all filter method is used to reduce gene cluster dimension then clustering tightness indicators is introduced to further reduce the feature genes, While the interaction between genes is taking into account, making this feature gene subset of features with low redundancy but highly classified information, not only with high recognition accuracy but also low redundancy. SVM is used as classifier, our experiments is based on four gene expression data sets commonly used in the international. The results show that the method presented in this chapter is superior to some other methods.
Keywords/Search Tags:Feature extraction, Biological information, Feature gene, Tumor subtypesrecognition, Function prediction, Subcellular localization
PDF Full Text Request
Related items