Study On Campylobacter Identification Method Based On Whole Genome Sequencing Data

Posted on:2021-03-31

Degree:Master

Type:Thesis

Country:China

Candidate:Z H Yang

Full Text:PDF

GTID:2404330602975161

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

Campylobacter spp is a class of zoonotic pathogens that can cause diarrhea in humans.Among several types of Campylobacter found,Campylobacterjejuni and Campylobacter coli are two main pathogens,which cause more than 90%of human diarrhea cases with percentages of 90%and 10%respectively.Traditional biochemical methods for Campylobacter identification have several problems,such as multiple steps,time-consuming and low throughput.And Polymerase Chain Reaction（PCR）based methods also have problems such as expensive reagents,multiple-step experiment and sample contamination that results in false positives and false negatives.In recent years,whole genome sequencing technology has been used in Campylobacter research.After processing and analysis,the sequencing data can be used to characterize different types of Campylobacter,or to quickly identify the genotype characteristics of populations,such as virulence and drug resistance.In this thesis,a bioinformatics method capable of accurately detecting Campylobacter is constructed based on the whole genome sequencing data of Campylobacter.The main work includes:（1）Constructed a computational pipeline for Campylobacter identification based on whole genome sequencing data,including quality control of sequencing data,genome sequence assembly,whole genome feature extraction,and Campylobacter identification based on support vector machine（SVM）/deep neural network（DNN）.（2）Studied and compared several quality control methods of sequencing data,and conducted quality control tests on the whole genome sequencing data of a Campylobacter sample using FastQC;studied and compared several genome sequence assembly methods and assemble the whole genome of the Campylobacter sample using SPAdes.（3）Extracted significantly different features by analyzing the whole genome sequences of Campylobacter samples,which includes whole genome sequence analysis,gene annotation,drug resistance gene analysis,multi-site sequence typing（MLST）and CRISPR-Cas system analysis.The experimental results manifest that sequence length,GC content,codon sequence density,aspA allele number,glyA allele number,and CRISPR repeat sequence NZ_CP017859₁ can be used as significant features for distinguishing Campylobacter jejuni from Campylobac ter coli,in which the repeat sequence NZ_CP017859₁ represents high distinguishable ability.（4）Constructed two Campylobacter identification models based on SVM and DNN respectively with a feature set including genomic sequence length,GC content,codon sequence density,aspA allele number,glyA allele number,and the CRISPR repeat sequence NZ_CP017859₁.Experimental results present that both of the machine learning methods exhibit good performance for Campylobacter identification,and the DNN-based method slightly outperforms the SVM-based one.In summary,the proposed computational method based on whole-genome sequencing data can be used to accurately distinguish Campylobacter jejuni from Campylobacter coli,and related bioinformatics methods and pipelines can be used for analysis and study of genome-wide sequence types of Campylobacter and even other prokaryotes.

Keywords/Search Tags:

Campylobacter, Whole genome, CRISPR-Cas system, Support vector machine, Neural network

PDF Full Text Request

Related items

1	Anticancer Drug Response Classification Based On Deep Neural Network And Support Vector Machine
2	A Machine Learning-based Approach To Building Predictive Models For The Field Of Traumatic Brain Injury
3	Application Of Multi-label Support Vector Machine In X-RAY Lung Disease Detection
4	Study On Water Quality Status Of Three Gorges Region Forecasting And Its Impact On Population Health Based On Support Vector Machine
5	Automatic Identification System For Pulmonary Nodules Based On Support Vector Machine
6	Research On Diabetic Retinopathy Detection Based On Convolution Neural Network
7	Research On Decision Support System For TCM Formulation In Chronic Glomerulonephritis
8	Research On Encephalic Tissue Recognition For MR Image Based On Support Vector Machine
9	Research Of Prediction For Brucellosis Based On Machine Learning Method
10	A Comparison Of Different Machine Learning Implementations In The Diagnosis And Prognosis Prediction Of Depression