Font Size: a A A

Research Of PolyA Sites Prediction Of Microsporidia Based On Machine Learning

Posted on:2018-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:K SunFull Text:PDF
GTID:2310330536973556Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the initiation and development of the human genome program,bioinformatics came into being.The intersection of biology and information technology not only promoted the development of information technology,but also greatly promoted the application of biology.State Key Laboratory of Silkworm Genome Biology of Southwest University is an advanced silkworm research laboratory,there are currently research on silkworm genome and functional genome,silkworm genetic resources and silkworm mulberry modern industry technology,silkworm mulberry pathogenic microorganisms and microbial resources utilization.Silkworm pathogens can infect silkworms and affect the growth and development of silkworms,it can brings great loss of silkworm industry.Therefore,silkworm pathogens attracts more and more scholars as a research direction.Organisms are constantly changing,genomic information is also different,machine learning have been used in predicting human genes and rice genes,as a pathogen infected silkworm,the researchers are very few,especially based on the computer algorithm research.In this paper,we use the algorithm of machine learning to predict and study the PolyA sites of microsporidia.Compared with the biological methods,improve the efficiency.Computer science provides a good idea for study of microsporidia in biology.Machine learning use experience to improve the system's own performance through the means of calculation.With the emergence of new technologies and new methods in the computer field,these algorithms have been applied to the field of bioinformatics and have been widely used in the field of gene prediction.Polyadenylation is an important step in the formation of mature mRNA in eukaryotic cells,the prediction of polyadenylation signals is a great significance for the coding genes.After thorough discussion with the silkworm microfilaria research group,in this paper,we use microsporidian Encephalitozoon cuniculi which lacks the effective gene prediction method as the reference models,the characteristic extraction of Encephalitozoon cuniculi was based on the Z-curve,the position-specific scoring matrix and the k-gram frequency.After extracting the 1,2,3,4,5 gram frequency,we combine the extracted k-gram nucleotide frequency,compare the result of theexperiments and select the optimal combination,we use the optimal combination,position-specific scoring matrix and Z-curve as the final feature.Then we use the PCA to reduce the dimension of feature space.Finally,we use different classifiers to classify the acquired feature,The method can select the optimal k-gram nucleotide frequency characteristic according to the expression preference of the microsporidian gene sequence,thus affecting the classification results.In order to accuratly predict the PolyA sites of microsporidia,the appropriate feature extraction method is significantly important for the classification..Support vector machines are widely used in different fields,and have many achievements in the fields of text classification,license plate recognition and image retrieval.In this paper,we use support vector machine,neural network and KNN algorithm to predict the PolyA sites of microsporidia,respectively.The results showed that the support vector machine algorithm was better.The kernel function is an important factor for support vector machine,in view of the fact that the conditionally positive definite kernel has been widely applied in the field of text classification and face recognition,on the basis of the polynomial kernel classification effect obtained in this paper,the polynomial kernel function is combined with the conditional positive nucleus to form a new kernel function,the hybrid kernel function is applied to the PolyA sites prediction field of microsporidia.The experimental results show that the mixed kernel function of linear combination of polynomial kernel and conditional positive nuclei is taken as the kernel function of SVM.By adjusting and modifying the model parameters,the classification effect is greatly improved.It provides the basis foundations and convenience on the biology of microsporidia and also provides some theoretical foundation for effective control of silkworm pests and diseases,it has important application value.
Keywords/Search Tags:Microsporidia, Support Vector Machine, PolyA signals, Position-Specific Scoring Matrix, Conditional positive definite kernel
PDF Full Text Request
Related items