Exploration And Application Of Visualization And Feature Numeralization For Multiple Sequence Alignments

Posted on:2023-03-22

Degree:Master

Type:Thesis

Country:China

Candidate:L Zhou

Full Text:PDF

GTID:2530306902992459

Subject:Bioinformatics

Abstract/Summary:

PDF Full Text Request

1.Visualization of sequence alignmentsThe identification of the conserved and variable regions in the multiple sequence alignment(MSA)is critical to accelerating the process of understanding the function of genes.As the sequence-structure-function relationship gains increasing attention in molecular biology studies,the simple display of nucleotide or protein sequence alignment is not satisfied.Despite the fact that existing visualization tools provide diverse functions for mining various types of information from MSAs,a number of issues remain unresolved.Firstly,it is difficult to capture molecular characteristics hidden in MSA by simply displaying nucleotide or protein sequence alignments at the site level.Other information such as residue dominancy and residue dependencies is also helpful when presenting the MSA data.Secondly,due to the altering of sequence fragments,recombination events also require a more intuitive way to present.Third,it is still challenging to combine external data with MSA in an efficient,accessible,and customizable way.Last,visualizing genome alignment involves presenting aligned fragments between species and rearrangement information in an efficient way,but this is rarely covered by other tools.To address these issues,we implement ggmsa,an R package providing a comprehensive set of methods for analyzing and visualizing the MSA by individuals or groups.We implemented a set of functions including sequence logo,sequence bundle,stacked sequence alignment visualization,and nucleotide comparative plots.These methods help in the identification of conserved or various trends in MSAs,sequence residue-residue dependencies and are utilized to mine the clues of recombination events.In addition,to explore the correlation between sequences and corresponding individual phenotypes or others,ggmsa implemented integrated visualization of MSA,phylogenetic trees,and associated data(e.g.,ancestral sequences,expression levels,genome locus structure,molecular functions)with the assistance of the in-house developed packages,ggtree,and ggtreeExtra,it helps to discover the underlying evolutionary features.We also design a new visualization method for genome alignments in Multiple Alignment Format(MAF)to explore the pattern of within and between species variation.2.Represent biological sequences into numerical valuesNumerical sequence features are numerical vectors recognized by computers.It is often used in the prediction and classification of biomolecules.In addition to the visualization of sequences,this study also tries to explore new application methods of numerical sequences We designed an R package--UltraPseR,which is a wrapper of UltraPse and contains multiple sequence coding schemes.It can transform the composition and order of nucleotide sequences or protein sequences into fixed-length numerical vectors.That data can be fed directly into machine learning.UltraPseR package allows users to quickly transform biological sequences into numerical vectors,which can be combined with other machine learning algorithms to efficiently complete the prediction and classification tasks of biological sequences.In this study,UltraPseR was applied to Human Leukocyte Antigen gene sequences,and support vector machine was used for numerical HLA to explore the feasibility of the numerical sequence method in HLA genotyping.

Keywords/Search Tags:

Multiple sequence alignment, Visualization, Represent sequences, Phylogeny, Machine learning

PDF Full Text Request

Related items

1	Study On Alignment Method Of Biological Multiple Sequences And Key Technologies
2	Study Of Multiple Alignment Algorithm For DNA Sequences Based On Graph
3	Multiple Sequence Alignment. Bioinformatics Algorithm
4	Research On Fast Multiple Sequence Alignment Based On Clustering
5	Bioinformatics Multiple Sequence Alignment And Phylogenetic Spanning Tree Of Several Techniques And Algorithms
6	Biological Sequence Alignment Algorithm
7	Virus Sequence Alignment And Classification Based On Hybrid Machine Learning
8	Study Of Several Algorithms For Alignment Problem Of Sequence And Sequence Secondary Structure
9	Research On Multiple Sequence Alignment Method Based On Single Molecule Sequencing Data
10	Research And Application Of Biological Sequence Feature Coding Methods