Font Size: a A A

Research On Prediction Of Protein Sub-mitochondrial Localization Based On Protein Sequence Information

Posted on:2023-07-09Degree:MasterType:Thesis
Country:ChinaCandidate:H D BianFull Text:PDF
GTID:2530306788995309Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The mitochondrion is a subcellular organelle existing in most eukaryotic organisms.It has a pivotal role in several biochemical functions for cells.The difference in location of sub-mitochondrial proteins causes different functions of proteins.Sub-mitochondrial localization of proteins is important for researching the function of proteins in mito-chondria,which provides information for drug design and cancer research.So far,there have been numerous computational methods for solving the issue,and the key is to im-prove the accuracy of the prediction model.This thesis discusses the topic of protein sub-mitochondrial localization and introduces the Doc2 vec algorithm to encode protein sequences.A new sub-mitochondrial protein dataset SM766-20 is proposed to expand the ex-isting datasets.The number of mitochondrial inner membrane proteins is large and the number of other three categories of mitochondrial proteins is small,which causes the imbalance of datasets.The adaptive synthetic oversampling algorithm is used to solve the problem of unbalanced datasets.A novel prediction method,Sub-Loc,was proposed to predict protein sub-mitochondrial localization.This method uses Doc2 vec to achieve a protein sequence embedding method,which embeds the protein sequence into low-dimensional vector and uses the support vector machine as a classification algorithm.The experimental results show that the adaptive synthetic oversampling algorithm can effectively improve the learning ability of the classification model.The sequence em-bedding vector can effectively represent the sub-mitochondrial proteins,and the linear correlation between the features in the embedding vector is weak,the similarity and the feature redundancy are low,which improves the prediction performance of the classi-fication model.In addition,the Sub-Loc method has achieved satisfactory results on two public datasets,which effectively improves the accuracy of predicting protein sub-mitochondrial localization.In this thesis,we design and implement an on-line system for predicting sub-mitochondrial localization of protein using the Django framework.
Keywords/Search Tags:Sub-mitochondrial localization, Doc2vec, Machine learning, Imbalance learning, Django framework
PDF Full Text Request
Related items