Font Size: a A A

A PiRNA Prediction Algorithm Based On Transposon Interaction Information And Discovering PiRNA Of Chilo Suppressalis

Posted on:2015-11-29Degree:MasterType:Thesis
Country:ChinaCandidate:K WangFull Text:PDF
GTID:2310330512972758Subject:Agricultural Entomology and Pest Control
Abstract/Summary:PDF Full Text Request
Piwi-interacting RNA(piRNA)is a new class of small Non-coding RNA which was discovered in 2006,according to the present study,there are two ways of biosynthesis of piRNA,the first one is in the germ cell,the primary piRNAs were amplified by the"ping-pong" loop mechanism;the second one is in the somatic cell,piRNAs were produced with relevant proteins.PiRNAs biological functions are listed as follows;silencing gene transcription process,in study of fruitfly and rat,researchers found the evidence can prove that piRNA participate in the gene silencing process;maintain the integrity of germ cell and stem cell,piRNA can restraint the transposon in germ cell and stem cell in order to maintain the basic function of these cells;regulation of mRNA translation and stability,in some specific tissue or developmental stage piRNA can regulate the expression of protein genes;in addition,piRNA also can guide epigenetic mechanism.There are several sequence features of piRNA.First of all,the length of piRNA is longer than the others small RNAs which is almost 21 nt,but length of piRNA is almost 30nt secondly,piRNA was clustered in the genome,the range is probably 20kbs to 90kbs;finally,piRNA has a great potential of 5' uracil.Nowadays,discovering of piRNA is basically depend on molecular biological method,the bioinformatics algorithm of piRNA prediction is rare and with limited precision.In this paper,a new algorithm for piRNA prediction based on study of transposon sequences was developed,which by using support vector machine(SVM).Two data set was downloaded from UCSC Genome Browser and NONCODE database which include four species' transposon sequences(D.melanogaster,H.sapiens,R.norvegicus,M.musculus)and three species' piRNA sequences(H.sapiens,R.norvegicus,M.musculus),respectively.The piRNA sequences of D.melanogaster was downloaded from NCBI.The total number of piRNA sequences is,13,848 sequences of D.melanogaster,32,152 sequences of H.sapiens,66,758 sequences of R.norvegicus,75,814 sequences of M.musculus.In this article,piRNA of D.melanogaster was used as the training data for SVM classifier.The negative data set was created by using non-code sequence that randomly cut from NONCODE date set of D.melanogaster which can also mapped to the transposon of D.melanogaster.The negative sample has the same length distribution of the real piRNA sequences of D.melanogaster.The positive sample contains 9,758 real piRNA sequences and the negative sample contains 9.240 non-piRNA sequences.The Triplets structure features of piRNA sequences was extracted from the information of combination of piRNA and transposons by using SeqMap and RNAplex.Those features were used for SVM classifier training.Grid search method was used to optimize the parameter of SVM model,and 10-corss validation method was used for SVM training and evaluation.The SVM model shows excellent result which achieving 95.3±0.33%accuracy and 96.0±0.5%sensitivity on Drosophila data.The SVM classifier can be used to correctly predict human,rat and mouse piRNAs,which accuracy are 93.50%,88.98%and 89.18%,respectively.Applied this algorithm to Chilo suppressalis small RNA data,82,639 piRNA of Chilo suppressalis were predicted,and we analysis these sequences features.The results show that the piRNA of insect is really different from the piRNA of mammalian.They have different sequence length distribution;in mammalian,piRNA has a higher percentage of 5'uracil;piRNA of insect and of mammalian have the same base composition.
Keywords/Search Tags:piRNA, sequences classification, support vector machine(SVM), piRNA sequences of Chilo suppressalis
PDF Full Text Request
Related items