Font Size: a A A

Plant Tissue Specific APA Sites Identification Based On SVM-RFE

Posted on:2018-10-13Degree:MasterType:Thesis
Country:ChinaCandidate:P Y LiFull Text:PDF
GTID:2310330515952777Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
The polyadenylation process of eukaryotic cells is the key step in transcripting gene into the mature mRNA.The polyadenylation(APA)site determines the termination of gene transcription and plays an important role in the regulation of gene expression.If a gene has multiple poly(A)sites,its pre-mRNA will be selectively cut at one of the sites,that is,gene can achieve the diversity of expressionby resulting in different length of mRNA.the alternative polyadenylation process of eukaryotic is abundant,More than 70%of the genes in rice have more than one poly(A)sites.Analysising and differentiating different types of APA sites help study the mechanism of gene expression in tissue specificity,it can also promote the understanding of the growth and development of organisms.The study of tissue specificity is an important step to explore the process of life activities and the function of cells.With the development of biotechnology,the expression data of all kinds of biological tissues show a trend of large-scale growth,this makes it possible for tissue specific studies,but also poses challenges to the processing and analysis of large scale genome data.the identification of tissue specific APA sites,the current research focuses on animals.Due to the characteristics of poly(A),which is characterized by its dispersion,variability and complexity,it is very difficult to identify tissue specific genes,and there is no relevant research on the identification of tissue specific APA sets in plants.Based on the support vector machine(SVM)and recursive feature elimination algorithm(RFE),the identification of rice specific APA sets was studied in this study.First of all,the expression data of the APA sets were obtained by extracting the APA sets and The standardized process of data in 14 tissues of rice.Secondly,the tissue specific and non tissue specific APA sets were selected from the gene expression data by means of the entropy weighted mean value method,and the data were used as the data set of true and false tissue specific APA sets.Again,according to the signal characteristics of the downstream area of the APA sets,the nearest neighbor feature,the z curves feature,the secondary structure feature,the nucleosome position featureand the first-order Markov heterogeneous matrix feature were extracted to form feature space.Finally,using the 2693 tissue-specific APA loci identified by Entropy as the training set,the SVM model was constructed by using the SVM-RFE algorithm for feature selection.The experimental results show that the SVM-RFE algorithm can improve the recognition accuracy of the recognition model from 0.68 to 0.7,and the nearest neighbor feature is the most important feature to identify the tissue-specific APA sets.
Keywords/Search Tags:alternative polyadenylation, tissue-specificity, SVM-RFE, feature extraction
PDF Full Text Request
Related items