Font Size: a A A

A New Method To Identify Potentially Retrotransposition-competent LINE-1

Posted on:2018-08-31Degree:MasterType:Thesis
Country:ChinaCandidate:X F XuFull Text:PDF
GTID:2310330536968954Subject:Pharmacy
Abstract/Summary:PDF Full Text Request
Long interspersed nuclear elements(LINE-1 or L1)are the most active and successful transposable element in the majority of animals.Almost 17.5% of the human genome are L1 and more than 6700 copies distribute in all the chromosome.But the majority of these L1 s have 5'-end truncations and kinds of mutations which make L1 lose ability to move in the genome.In the recent works a few researchers reported that some full-length L1 s are still hot by cultured cell retrotransposition assay.And other study cloned some possible full-length L1 s from the human genome to identify their activity and estimated that there were 80~100 retrotransposition-competent L1 s.Hot L1 s are easy to influence human genome's stability and are responsible for many diseases causing mutation.But excepting from retrotransposition assay there is no effective way to distinguish the retrotransposition-competent L1 s from the vast majority of inactive L1 s.And the mainly way to analysis L1's is mutation experiment and some mutations were proved to abolish or reduce L1's ability to move in the human genome.It is urgently necessary to find a reliable and accessible way to predict L1s' activities.In this work,by characterizing DNA sequence a new method based support vector machine(SVM)is introduced to predict the activities of unknown L1 s and classify it into active element and inactive one.Considering the nucleotide's type,hydrogen bond type and chemical group three characteristic sequences can uniquely determine a DNA sequence.The global protein sequence descriptors which uses composition,transition and distribution to generate descriptors is also able to describe DNA characteristic sequence.Finally a characteristic sequence produce a13-dimession feature vector and a DNA sequence is transformed into a 39-dimession vector.Through the earlier studies 168 identified L1 s are collected and act as the training dataset for the SVM modeling.The final SVM model is validated by independent test and acceptable result is given.Using this model to screen the human genome reference sequence 81 L1 s are predicted to be active.
Keywords/Search Tags:LINE-1, SVM model, DNA sequence, LINE-1 activity
PDF Full Text Request
Related items