Font Size: a A A

Research On Method Of Feature Extraction For Sequence Based On New Expression Pattern And Its Application

Posted on:2018-06-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q L XiaFull Text:PDF
GTID:1310330542969448Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,due to the rapid progress of high-throughput genomic and proteomic techniques,a large number of nucleotide and protein sequence data have been collected and stored in a number of related bioinformatics databases through information technology.Data mining and nucleotide and protein-related structure,function and other related information is the current bioinformatics will break through the focus and hot spots.The nucleotide sequence and the protein sequence contain the most important information in the biological code.By comparing the nucleotide sequences of different species,we can get their similarity,and then we can infer the phylogenetic relationship among the species.The location of subcellular cells is directly related to the function of proteins,and proteins can only play a role in specific subcellular localization.As a basis for studying the function of proteins,subcellular localization is an important source of information for studying protein function.Protein post-translational modifications can occur at any key step in the protein's life cycle.Sequence-based analysis of proteins and their post-translational modifications have important implications for the study of human common heart disease,cancer,neurological degenerative diseases and diabetes.With the rapid increase in the number of biological data,the relevant research depends entirely on the biological experimental method to verify almost impossible to achieve,only combined with molecular biology,immunology,cell biology and other traditional biology,and then the development of information science methods to carry out research in related fields is imperative.In this paper,the real DNA sequence and protein sequence data were used as the research object,and the feature extraction method of the sequence was discussed.The similarity analysis of the DNA sequence and the post-translational modification and subcellular localization based on the protein sequence were discussed with the machine learning method.The main research work is as follows:(1)Based on the four classification of genes,the DNA first order sequence is transformed into "structure diagram".By considering the properties of the three groups and improving the molecular topological index of the distance matrix,the DNA coding of the DNA sequence is proposed.Then we transform the DNA sequence into six molecular topological indices of the six graphical structures.By the Euclidean distance calculation similarity,we conducted similarity analysis of 11 species.From these similarities,we can reveal that the homology is consistent with the evolutionary relationship,and the experimental results are in accordance with the evolution of biological species.The advantage of our method is that the graphical invariants of the sequence are easy to compute and can be applied to comparing the DNA sequence rather than the string sequence itself.(2)In this paper,a new graphical representation of DNA sequences based on Fermat spiral curve is proposed,and the similarity analysis among each species gene is carried out accurately.First,the method uses the Fermat helix to graph the gene sequence under the premise of preserving the original position information of the sequence.Then,using the local positional relationship between adjacent bases in the original gene sequence,the corresponding mass is calculated according to the specific method,and then assigned to each point in each Fermat helix,and the composition is composed of each particle Fermat spiral.The normalized moment of inertia of the Fermat helix is then calculated as a numerical representation of the DNA sequence and applied to the similarity analysis of the ?-globulin gene of different species.(3)Based on the distribution of genetic codon,a 3D expression method of protein sequence was proposed.The method was used to find the distribution of 20 amino acids in the three-dimensional space from the nucleotide triplets,and the 3D pattern of protein sequence was constructed,and its application in the biological sequence of phylogenetic tree construction.Then,a new recursive distance calculation method is proposed,which can be used to calculate the distance between different length sequences without using the feature matrix,which eliminates the process of constructing the matrix and then applies it to the apoptotic protein subcellular Positioning.The experimental results show that this method has certain advantages compared with some machine learning methods.Moreover,the method does not need to calculate the matrix,do not need the machine learning process,the calculation is simple and easy to implement.(4)Protein malonylation is a newly discovered protein post-translational modification,due to the limitations of experimental techniques,how to quickly and accurately identify the malonylation site is a huge challenge.In this paper,a method for the prediction of malonylation sites of lysine based on pseudo amino acid composition was proposed.First,a method of calculating a protein fragment is proposed which allows lysine to be located at the center of each fragment.Then,the fragments were extracted using the method based on the pseudo-amino acid composition.The non-malonylation sites and the malonylation sites are then identified by using a support vector machine.In the experiment,we successfully identified 144 sites in the given 160 malonylation sites.In addition,the differences between malonylated and non-malonylated fragments in this paper show that lysine malonylation follows a specific pattern,therefore,the proposed method will be developed to identify malonyl site.
Keywords/Search Tags:DNA sequence similarity analysis, Protein post-translational modification, Subcellular localization prediction, Pseudo-amino acid composition, Support vector machine
PDF Full Text Request
Related items