Research On New Methods Of Unbalanced Learning In Bioinformatics

Posted on:2018-10-04

Degree:Master

Type:Thesis

Country:China

Candidate:L Y Shen

Full Text:PDF

GTID:2350330512478775

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Bioinfonnatics is an interdisciplinary field that mainly involves life science and computational science.It focuses on applying computational and statistical techniques to solve real-world problems arising from the analysis and computation of biological data.Bioinformatics focuses on developing computational techniques to increase understanding of biological processes.It’s significant to solve the class-imbalance problems which have a serious impact on the performance of standard classifiers in machine learning problems.The study of machine learning shows that applying the traditional machine learning methods directly to imbalance problems often leads to the tendency of the prediction results to the majority.The phenomenon of imbalance is common in the field of machine learning and bioinformatics.Protein-ATP(Adenosine-5’-triphosphate)binding residue prediction is a typical imbalanced learning problem.ATP interacts with protein in a wide variety of biological processes.It’s very significant to accurately identify binding residues solely from protein sequences.A common approach of improving the prediction performance for imbalanced learning problems is to balance the sizes of different classes by changing the numbers and distributions within them.Oversampling is a popular method in dealing with class-imbalance problems,which attempts to balance the sizes of different classes by generating additional samples for minority class.In this study,we propose a new oversampling algorithm that synthesizes new samples for minority classes by the Gaussian mixture model.The Gaussian mixture model is employed to generate additional samples and data cleaning techniques,Tomek-links,is used to remove the borderline sample pairs,which result from oversampling process.Comparing with several state-of-art related methods,the experimental results on UCI datasets demonstrate that the proposed oversampling algorithm can relieve the severity of class imbalance and help to improve classification performance.We also apply the proposed algorithm to the protein-ATP binding site prediction problem to evaluate the effectiveness of the algorithm.In addition,the sparse representation technique is introduced to select the generated samples,which embody more explicit semantic information.The experimental results on several protein-ATP interaction benchmark datasets demonstrate the effectiveness of the proposed oversampling algorithm.

Keywords/Search Tags:

imbalanced learning, oversample, Gaussian mixture model, sparse representation, data filtering, protein-ATP binding, binding residues prediction

PDF Full Text Request

Related items

1	The Evolutionary Conservation-based Analysis And Prediction For DNA-binding Residues
2	Study On Protein - Vitamin Binding Site Prediction Based On Unbalanced Learning
3	Intelligence Algorithms For Protein Structure Prediction And Nucleic Acids Binding Site Annotation
4	Research On The Methods For Identifying Nucleic Acid Binding Protein And Its Binding Residues
5	Analysis And Prediction Of Rna-binding Residues In Protein Molecules
6	Sequence-based Prediction For The Protein-peptide Binding Residues
7	Two Special Types Of Protein Functional Residues Of The Prediction And Biological Sequence Alignment
8	In Silicon Prediction Of DNA-binding Residues In DNA-binding Proteins
9	Research On Prediction Of Protein-ATP And Protein-DNA Binding Sites Based On Deep Learning
10	Research On Intelligent Computing-based Methods For Protein-peptide Binding Prediction