Font Size: a A A

Prediction Of Protein Subcellular Localization Based On Two Regular Term Nonnegative Matrix Decomposition

Posted on:2019-01-26Degree:MasterType:Thesis
Country:ChinaCandidate:P ZhouFull Text:PDF
GTID:2370330578982103Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Since intracellular environment consists of a series of complex and regulated biochemical processes,normal life activities depend on protein appearing in specific subcellular location to exercise specific function.The subcellular localization of human proteins is essential to understanding the process of cell life and the function of human proteins.However,the time and money consumption is high for manually annotating protein subcellular location.Therefore,it is imperative to establish automated protein subcellular localization system with the aid of efficient machine learning algorithms.On the other hand,the development of imaging technology has accumulated numerous image data in the field of bioinformatics,which prompts people's need of transforming biological image data into corresponding biological knowledge.In this background,image-based protein subcellular localization have attracted much attention.Image-based protein subcellular localization prediction mainly includes: data set preparation,image source separation,feature extraction and feature selection as well as pattern classification.For these four steps,a series of algorithms have been proposed.The main purpose of this paper is to study the separation of image source separation,which aims at separate protein signals from image.Image source separation is the first step of data processing,in which errors generated will accumulate and enlarge during subsequent data processing step,and have a great influence in the system prediction accuracy.Therefore,the study of image source separation algorithm is of great significance for improving the classification accuracy of the final prediction system.The existing methods of image separation are divided into two main types: linear separation and blind source separation.But the linear separation can't adapt to the changes of color staining in image,which leads to few explanation to the model,and the blind source separation method can avoid this problem.In this paper,we present a new blind source separation algorithm,which adding sparse and minimum volume constraints into nonnegative matrix factorization.The main advantages of this new algorithm are: 1.adaptive to image data of nonnegative characteristics and mixed characteristics;2.the simplex constrainedimage data with minimum volume can make the solution more independent;3.sparse constraint make the solution more sparser.To test the effectiveness of the proposed algorithm,a human protein subcellular location analysis system have to be established.Most of the current analysis systems focus on the single-label learning algorithm,which locates human protein to only one subcellular location.However,some protein exist simultaneously in two or more than subcellular organelles in cells,or transport between different subcellular organelles with dynamic distribution characteristics,which makes single-label learning algorithms no longer applicable.Therefore,it is needed to study the multi-label learning algorithm to locate proteins to multiple subcellular organelles at the same time.In this paper,we design an image-based human protein subcellular using multi-label learning algorithm to deal with multi-label classification problems.The main steps achieved by this system include: 1.preparing dataset from the human protein atlas;2.The algorithm we proposed is applied to image source separation,3.Both global subcellular location features and local binary descriptors are used to characterize protein distribution for feature extraction,and stepwise discriminate analysis is used to achieve dimensionality reduction;4.According to classifiers designed whether considering class correlation,we use two kinds of multi label algorithm: binary relevance(BR)and classifier chains(CC).The experimental results show that the algorithm proposed by this paper exceeds the linear separation 6%,and 10% higher than the standard non negative matrix factorization algorithm,which shows that it can better understand the image signal separation model.For feature extraction,the combination of global features and local features is superior to only global features,which indicates that local features can make up for missing information of global features.For classifier,binary relevance and classifier chains have their own advantages and defects,so class relevance and independence should be both taken into account when the classifier is designed.
Keywords/Search Tags:Protein subcellular localization, Image source separation, Nonnegative matrix factorization based on two regularizations, Multi-label learning, Local and global features
PDF Full Text Request
Related items