Font Size: a A A

Research On Protein Subcellular Location Classification Based On Feature Learning

Posted on:2021-08-26Degree:MasterType:Thesis
Country:ChinaCandidate:B TanFull Text:PDF
GTID:2480306470963029Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Protein is the material basis of all living organisms and the main undertaker of life activities.Therefore,proteomics has gradually become a core research content of gene functional analysis in the human genome project,and the main task is to study the function of protein in the cell environment.The function of protein is closely related to its location and distribution in cells.After being synthesized in the ribosomes of the cells,the protein is transported to a specific cell organ,i.e.subcellular structure,which leading the life characteristics of the cell and the life normal operation.Therefore,the recognition of protein subcellular location plays an important role in understanding its function,studying cell metabolism,disease diagnosis and drug discovery.In recent years,with the rapid development of fluorescence microscopy technology,highthroughput fluorescence microscopy has been able to automatically and rapidly produce a large number of multi-label fluorescent protein cell images.Compared with previous biological experiments or amino acid sequence studies,multi-label fluorescent protein cell images can more accurately and intuitively show the distribution of specific proteins in cells.However,due to the mixed imaging mode of multi labeled fluorescent protein and the very small imaging of some subcellular structures,the accuracy of the existing multi-label protein subcellular classification algorithm is far less than the expert manual labeling.Focusing on this task,this thesis studies the feature extraction model and multi-label classification algorithm of multi tag fluorescent protein cell image,and proposes a new supervised feature learning prediction model,which is proved to be superior to the commonly used artificial feature subcellular location feature set SLFs by experiments on the open data set.The main work of this thesis is as follows:(1)The cell image data set of human protein library HPA selected in this thesis is analyzed in detail,and the targeted data augmentation and preprocessing are carried out according to its relevant characteristics.The current popular BN-Inception convolutional network structure is used for feature learning and extraction of images,combined with the cross-entropy loss function BCEloss commonly used in multi-label classification algorithms to train the model.The combination of BN-Inception+ BCELoss is used as the basic feature learning model in this thesis,and the effectiveness of the model is confirmed through experiments,and this is used as the basis for the improvement of the network structure and classification algorithm in this thesis.(2)According to the characteristics of HPA cell image as fine-grained image,this thesis introduces and analyzes the principle of extracting fine-grained feature by bilinear convolution neural network B-CNN,and based on this principle,the basic network BN-Inception in this thesis is improved to BF-BNInception network.This network improves the B-CNN dual network structure to a single network structure by extracting the middle layer feature map as auxiliary information,while maintaining the effectiveness of fine-grained feature extraction,greatly reducing model parameters and improving classification performance.(3)Aiming at the problem of class imbalance in HPA cell image data set,a kind of class weight factor that can be adjusted smoothly is proposed,and it is combined with the focus loss function FLoss which is used in the detection task to become a loss function BFLoss of multilabel classification task.The basic classification loss BCELoss is replaced by the BFLoss proposed in this thesis,which effectively alleviates the impact of class imbalance on model training.(4)The two improved methods are tested independently to prove the effectiveness of the improved method and the degree of performance improvement.The final model BFBNInception+BFLoss is proposed by combining the two improvements.The effectiveness of the overall improvement of HPA cell image data set in this thesis is verified by comparing with the basic feature learning model BN-Inception+BCELoss.The experimental results show that the final model BF-BNInception+BFLoss greatly improves the accuracy of protein subcellular location classification in fluorescent cell images compared with the classification model based on manually set subcellular location features.The feature learning model proposed in this thesis is helpful for the subcellular location annotation task of large batch fluorescent protein images generated by high-throughput fluorescence microscopy,which overcomes the shortcomings of the traditional model in automatic annotation accuracy,and has certain academic value in the research of bioinformatics.
Keywords/Search Tags:subcellular location classification, feature learning, fine-grained image, class imbalance, multi-label classification
PDF Full Text Request
Related items